Erick / Steve, Thank you both (as well as everyone else who weighed in) on helping get to a far more optimal solution well before any code was ever slung.
Since we all know that someone else is going to find this in the archives some day, I'd like to unveil the rest of my ignorance and misconceptions, on the off chance that others following in these footsteps may be under the same delusions about how Lucene operates under the hood. Please forgive any moronic questions, no matter how gross in error. First, the very existence of the DateField in past versions of Lucene subtly gave me the impression that Lucene was actually data-type aware. So its sudden deprecation was a shocking loss. What I quickly gathered was that DateField was nothing more than a helper class, and that it's only function in the universe was to inject properly formatted text dates into the token stream. If I'm mistaken or have a nuance wrong, please correct me, as big errors compound from little misunderstandings. Second, and along the same lines because of the data-type goof, I was operating under the impression that a document hand a single token stream and that the meta data (the other fields) that were added to the documents were name/value pairs (e.g. creation date, version number, author name). What I'm picking up is that each field is its own token stream, and this is why I can have a list of dates. Am I still on track so far? Third, I read from the documentation that the .add() method is simply appending. At this point, I'm wondering if add is doing tokenizing at each add(), or when the document is indexed. This is a matter of slight importance, especially if one's understanding of the token stream is a little shaky. Document's add() method is described like this: "Several fields may be added with the same name. In this case, if the fields are indexed, their text is treated as though appended for the purposes of search." The catch here is in understanding the behavior of "append". Lucene's documentation would benefit from an example. document.add( Field.Text("somefield", "hot" ) ); document.add( Field.Text("somefield", "dog" ) ); Did I just add "hotdog" to my document, or two tokens, "hot" and "dog"? The reason this is important to wrap one's mind around is when adding multiple dates. It had been suggested that I add dates with a separator. document.add( Field.Text("interestingdates", "17760704,20010911,20070305" ) ); If add() is tokening up front, then three calls would be the logical equivalent, and I wouldn't need to artificially add separators while doing a looping construct. document.add( Field.Text("interestingdates", "17760704" ) ); document.add( Field.Text("interestingdates", "20010911" ) ); document.add( Field.Text("interestingdates", "20070305" ) ); Are the three adds shown here the same as the one large add above `with commas to Lucene? And finally, the question that pulls all three of these diverse issues together: If Lucene is really just looking a stream of tokenified text, and dates/text are tokens no different from one another, and I can inject properly formatted dates into the stream without requiring an honest to goodness data type or artificial list -- what would prevent one from detecting a date looking "thing" in a stream and injecting a date-looking token at that place in the document? For example, if "December 25th, 2007" was in the text stream, replace that block of text with "20071225" and let the tokenizer do its thing as normal, none the wiser. In theory, wouldn't this let the user be able to search the regular text body for words and dates at the same time? What I'm hoping you'll say is, yes, you could do that, ...but you wouldn't want to for performance reasons. Thanks again, -wls