We're also creating multiple lucene Documents (Article) from one XML
document (ArticleCollection).

Our XML documents are quite large and we have a large quantity of them so we
use a SAX parser to process the XML and create multiple lucene Documents in
the index along the way - this is very much faster than using any DOM parser
to do the same job.

The code we're using to create the index is based on one in the
contributions directory for lucene:

http://jakarta.apache.org/lucene/docs/contributions.html

Its the one called XMLDocument #1 which links to:

http://marc.theaimsgroup.com/?l=lucene-dev&m=100723333506246&w=2

Richard Taylor
New Scientist Developer

----- Original Message -----
From: "Peter Carlson" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, May 22, 2002 3:46 PM
Subject: Re: Indexing multiple equiv fieldnames (revisited)


>
> You should have any significant performance change.
>
> In order to do this, use xalan and get a NodeIterator based on a given
> xpath. I think there is an example of this on the xalan site
> (xml.apache.org). I use this same technique to create mulitple physical
> documents from and xml file that has many top level elements.
>
> --Peter
>
> On 5/21/02 8:33 PM, "Mohamed Idrees" <[EMAIL PROTECTED]>
wrote:
>
> > Can someone clarify me the following:
> >
> > Will there be any significant change in performance, if one was to
> > create individual
> > lucene documents for each of the main element (customer in this case) in
> > one single xml file?
> >
> > Wish someone comes out with contribution to avoid this.
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to