One strange "gotcha" in Lucene: Field.Text(String, String) creates a field that is stored, but Field.Text(String, Reader) creates a field that is *not* stored.
I'd naively assumed that the two Field.Text() methods were just for convenience, I hadn't expected that the semantics would change depending on what way I got the data into the Field. That seems like a misfeature to me. For my own purposes I did a little chart of all 8 possible field types, and which factory methods create which. The docs are all consistent with the code, just seeing it this way made it clearer to me. Stored Indexed Tokenized yes yes yes Field.Text(String, String, String) yes yes no Field.Keyword(String, String) yes no yes yes no no Field.UnIndexed(String, String) no yes yes Field.Field(String, String) , Field.UnStored(String, String),Field.Text(String, Reader) no yes no no no yes no no no One thing that pops out is that it never makes sense to tokenize but not index. Similarly, it never makes sense to neither store nor index. This all seems obvious in retrospect, but it's helped me understand how tokenizing, indexing, and storing all fit together. PS: when does the switch over to apache.org take place? I understand the CVS at SourceForge is already out of date. When do the mailing lists switch over? [EMAIL PROTECTED] . . . . . . . . http://www.media.mit.edu/~nelson/ _______________________________________________ Lucene-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/lucene-users