Summary: I think there will be no real impact if you use longer field
names.
Details:
Index size will be just a tiny bit bigger. There is a single file
per segment (*.fnm) that resolves the field names into integer IDs,
then the rest of the index uses these integer IDs. So only that
file, which holds one copy of the string name of each field, will be
bigger.
Indexing may be a bit faster with shorter names just because Lucene
does a hashtable lookup of that string (one per field per document)
to find the corresponding integer. My guess is this is a very
negligible impact, especially if the content of each field is long.
Searching goes through a similar hashtable lookup, after which the
field name is interned, but for a largish index that time is surely
negligible as well. But you should test! And please report back :)
Mike
John wrote:
Hi,
Lets say my data source consists of records like so (the example is
Field=Value):
? AAAAAAAAAA=Value1
? BBBBBBBBBB=Value2
? CCCCCCCCCC=Value3
? DDDDDDDDDD=Value4
And lets say I a second copy of my data but this time it looks like
so:
? A=Value1
? B=Value2
? C=Value3
? D=Value4
I..e, same data, the only change is the field names?are now shorter
Now if?i create two Lucene indexes ... one using the long field
name and one using the short field name (my data has not
changed) .. will the Lucene index size be smaller for the short
field name one?? Will updating and optimizing the index be faster??
Will searching be faster?
That is, i'm I better off using short field names vs. long field
names?
Yes, i will do some performance analyses .. but i want to know if
this matters before I do so.
Thanks in advance!
-JM
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]