Yes, parsing of a 'codes' document field you will have to do yourself, of course. You concatenated all those values manually, you'll have to split them manually, too. The process that I described before can still be used, you just have to add 'split value of codes field on comma delimiter' and store each in a Set. Since Set doesn't allow duplicates at the end of the loop that loops through all documents you will have a unique Set of code values used in your index.
Otis --- Terry Steichen <[EMAIL PROTECTED]> wrote: > Otis, > > Thanks for your response, but I don't think I was particularly clear > in my > original message. Here's an expanded description. > > For each Lucene Document in the index there will be a 'codes' field > which > will contain a comma-delimited set of codes (this is the result of my > concatenation at index-time of the individual 'code' sections from > each of > the corresponding XML documents). > > In other words, assume the original XML document contains something > like > this: > ..... > <codes> > <code>value_of_code1</code> > <code>value_of_code2</code> > <code>value_of_code3</code> > </codes> > .... > > When I index each such an XML document, I create a Lucene Document > that has > a field called 'codes', which has the value: "value_of_code1, > value_of_code2, value_of_code3". (I do this so I can do boolean > searches on > this field, so see which documents may have value_of_code1 AND > value_of_code2 AND NOT value_of_code3, for example. > > Consider that each 'value_of_codexx' is a keyword. Each XML document > may > have zero or more such keywords (aka code sections). I'm trying to > figure > out a way to get a list of all the keywords used by the XML documents > that > have been indexed. It seems to me, the index itself (even though I > do > store this concatenated result in it) won't really know how to parse > the > string of comma-delimited code values that comprise each 'codes' > field > value. > > Does that make more sense? > > Regards, > > Terry > > --- Original Message ----- > From: "Otis Gospodnetic" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Sunday, November 17, 2002 4:24 PM > Subject: Re: Enumerating Concatenated Fields > > > > If I understand what you want - open an index with IndexReader, get > the > > # of documents in it via IndexReader, loop through all documents, > > getting one with it's ID, and for each of them get field 'codes' > out of > > it. > > > > Otis > > > > > > --- Terry Steichen <[EMAIL PROTECTED]> wrote: > > > I have a collection of XML documents, each of which contains a > > > 'codes' section, each of which contains zero or more 'code' > sections. > > > When I index the documents, I concatenate all the non-empty > 'code' > > > sections into a single 'codes' index field to facilitate boolean > > > searching. > > > > > > Given my structure, is there a way that I could get a list all > the > > > defined 'code' values in the entire set of documents? If not (as > I > > > suspect), is there a way that I could change the indexing scheme > to > > > add this functionality? > > > > > > Regards, > > > > > > Terry > > > > > > > > > > > > > > > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Web Hosting - Let the expert host your site > > http://webhosting.yahoo.com > > > > -- > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > __________________________________________________ Do you Yahoo!? Yahoo! Web Hosting - Let the expert host your site http://webhosting.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
