A bit of clarification:
Lucene index is made of multiple "segments".
Compound format: stores each segment in a single file - less files
created/opened.
Not-compound format: stores each segment in multi-files - more files
created/opened.
Not-compound is likely to be faster for indexing.
Optimizing the index (no matter if compound or not) would merge index
segments into one segment - this would also make search faster.

Also see http://lucene.apache.org/java/docs/fileformats.html

"Simon Willnauer" <[EMAIL PROTECTED]> wrote on 10/10/2006
03:56:17:

> Hi,
>
> In Lucene there are two types of index structure  compound index and
> multi-file index. In multi-file index, when new documents are inserted
> to an index, they are stored in a separate segment; this causes
> increase of files in an index structure. Therefore, multi-file index
> has more files than compound index.
> Compound index type consists of three files; two of them are
> "deletable" file that shows the unused files in index and "segments"
> file that shows the segment names and their size. The third one
> contains the all indexed documents and their field values. In compound
> index all indexed files are merged into one single file. So, the
> number of files in the index is minimized.
> The advantage of multi-file is the time for indexing documents takes
> less than compound file. Because, in compound file the indexed files
> are in addition merged into one single file. This can be suitable when
> the number of documents is large while indexing.
> On the other hand, the advantage of compound file appears in
> searching. Because, the total number of file accesses for reading data
> are minimum in compound index. In contrast, using multi-file index the
> file fetches increase because the program needs to open more files in
> order to retrieve required documents from the index. This is important
> while search time in an application is in consideration.
>
> I do prefer to use compoundFile(true) as I work on unix platforms
> otherwise you will end up with "too many open files" very often!
>
> best regards Simon
>
> btw: nice to see another person from Berlin on the list!
>
> -----------
>
> mailto: [EMAIL PROTECTED]
>
> On 10/10/06, Supriya Kumar Shyamal <[EMAIL PROTECTED]> wrote:
> > Hello All,
> >
> > I have question regarding the use of Compound file fo rindex, what is
> > the advantage & disadvantage of enabling use of compound file(which is
> > default I think) or disabling the useo of it.
> >
> > Thanks,
> > supriya
> >
> > --
> > Mit freundlichen Grüßen / Regards
> >
> > Supriya Kumar Shyamal
> >
> > Software Developer
> > tel +49 (30) 443 50 99 -22
> > fax +49 (30) 443 50 99 -99
> > email [EMAIL PROTECTED]
> > ___________________________
> > artnology GmbH
> > Milastr. 4
> > 10437 Berlin
> > ___________________________
> >
> > http://www.artnology.com
> >
__________________________________________________________________________
> >
> >  News / Aktuelle Projekte:
> >  * artnology gewinnt Ausschreibung des Bundesministeriums des Innern:
> >    Softwarelösung für die Verwaltung der Sammlung zeitgenössischer
> >    Kunstwerke zur kulturellen Repräsentation des Bundes.
> >
> >  Projektreferenzen:
> >  * Globaler eShop und Corporate-Site für Springer:
www.springeronline.com
> >  * E-Detailing-Portal für Novartis: www.interaktiv.novartis.de
> >  * Service-Center-Plattform für Biogen: www.ms-life.de
> >  * eCRM-System für Grünenthal: www.gruenenthal.com
> >
> >
___________________________________________________________________________
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to