This confused me at first too, so here's my current understanding... When you use YES, you store the actual data as-is with the document. This is entirely independent of indexing. Internally, I assume that searching and storing are separate parts of the index that have nothing to do with each other.
When you use NO, (and here I assume you index the data, because not storing it and not indexing it is logically a no-op), the relevant terms are stored in the index (stop words possibly removed) but NOT stored with the document. Programmatically, you can't say something like doc.get("field") on a field that's not stored. When you do NOT store data and use Lukes "reconstruct and edit" button, you will probably not get an entirely accurate version of the document because what I believe is happening (although I haven't been in the guts of luke) is that it's is using something like TermEnum for all the terms in the index and ordering them sequentially for each unstored field in the particular document. Conceptually, I think it's something like For each term in the index Assemble an ordered list of the termpositions in this document. now merge all those lists by termposition. So stemmed terms may/may not come back correctly. I don't think you get stopwords. Etc. Luke does its best to reassemble unstored fields from the index data, but for unstored data you'll see, in big red letters "RESTORED content ONLY - check for errors!" It's a inevitably a lossy process. As for your question of which is better for small or large fields... It's not a relevant question. A better question is "Will I ever need the field exactly as it was originally?". If the answer is YES, store it. Think of them as two independent questions. Do I need to show the original to the user? Store.Yes. Otherwise NO. Do I need to search the data? Index.TOKENIZED/UN_TOKENIZED. otherwise NO. You do NOT need to store data to search for it. In general, IMO, it's better to not store the data if you don't need it since the index that results is significantly smaller if you don't store data. On large indexes, BTW, Luke takes a LONG time to reconstruct a document. It has to do a lot of work behind the scenes. So think of indexing and storing as putting the data in different places. Indexing data puts it in with all the searchable terms. Storing it puts it with the document. Indexing is for find things, and storing is for showing the original to the user. You can do either or both. I'm not sure whether this has added more confusion or cleared things up, but at least it's a try <G>. Best Erick On 3/16/07, cybercouf <[EMAIL PROTECTED]> wrote:
I'm using Lucene for indexing my nutch crawls. But I don't really understand the difference for this flag Field.Store.YES or NO. It seems (using luke) I still can read some data who were not 'store.YES'. Where are store this data if it's not in the index? what is better to use for small fields? (and for medium ones) thanks to give me some light in my understanding! -- View this message in context: http://www.nabble.com/How-the-Field.Store-flag-works--tf3413510.html#a9511458 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]