Re: Enumerating Concatenated Fields

Otis Gospodnetic Sun, 17 Nov 2002 14:19:27 -0800

Yes, parsing of a 'codes' document field you will have to do yourself,
of course.  You concatenated all those values manually, you'll have to
split them manually, too.
The process that I described before can still be used, you just have to
add 'split value of codes field on comma delimiter' and store each in a
Set.  Since Set doesn't allow duplicates at the end of the loop that
loops through all documents you will have a unique Set of code values
used in your index.


Otis


--- Terry Steichen <[EMAIL PROTECTED]> wrote:
> Otis,
> 
> Thanks for your response, but I don't think I was particularly clear
> in my
> original message.  Here's an expanded description.
> 
> For each Lucene Document in the index there will be a 'codes' field
> which
> will contain a comma-delimited set of codes (this is the result of my
> concatenation at index-time of the individual 'code' sections from
> each of
> the corresponding XML documents).
> 
> In other words, assume the original XML document contains something
> like
> this:
> .....
> <codes>
>     <code>value_of_code1</code>
>     <code>value_of_code2</code>
>     <code>value_of_code3</code>
> </codes>
> ....
> 
> When I index each such an XML document, I create a Lucene Document
> that has
> a field called 'codes', which has the value: "value_of_code1,
> value_of_code2, value_of_code3". (I do this so I can do boolean
> searches on
> this field, so see which documents may have value_of_code1 AND
> value_of_code2 AND NOT value_of_code3, for example.
> 
> Consider that each 'value_of_codexx' is a keyword.  Each XML document
> may
> have zero or more such keywords (aka code sections).  I'm trying to
> figure
> out a way to get a list of all the keywords used by the XML documents
> that
> have been indexed.    It seems to me, the index itself (even though I
> do
> store this concatenated result in it) won't really know how to parse
> the
> string of comma-delimited code values that comprise each 'codes'
> field
> value.
> 
> Does that make more sense?
> 
> Regards,
> 
> Terry
> 
> --- Original Message -----
> From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
> To: "Lucene Users List" <[EMAIL PROTECTED]>
> Sent: Sunday, November 17, 2002 4:24 PM
> Subject: Re: Enumerating Concatenated Fields
> 
> 
> > If I understand what you want - open an index with IndexReader, get
> the
> > # of documents in it via IndexReader, loop through all documents,
> > getting one with it's ID, and for each of them get field 'codes'
> out of
> > it.
> >
> > Otis
> >
> >
> > --- Terry Steichen <[EMAIL PROTECTED]> wrote:
> > > I have a collection of XML documents, each of which contains a
> > > 'codes' section, each of which contains zero or more 'code'
> sections.
> > >  When I index the documents, I concatenate all the non-empty
> 'code'
> > > sections into a single 'codes' index field to facilitate boolean
> > > searching.
> > >
> > > Given my structure, is there a way that I could get a list all
> the
> > > defined 'code' values in the entire set of documents?  If not (as
> I
> > > suspect), is there a way that I could change the indexing scheme
> to
> > > add this functionality?
> > >
> > > Regards,
> > >
> > > Terry
> > >
> > >
> > >
> > >
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Web Hosting - Let the expert host your site
> > http://webhosting.yahoo.com
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> >
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - Let the expert host your site
http://webhosting.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Enumerating Concatenated Fields

Reply via email to