Besides the content field, everything is stored, so that may explain the
large CFS files.

Regarding the RAM-usage performance, I tried setting to 128, 256 and 512,
all gave the same time measurements (give or take ~5%) as the MBD (set to
10,000) run. I think it needs further investigation. Was it tested before? I
mean - has someone tried to set RAM to 2GB for example and noticed a major
performance improvement (as I'd expect)?

On Wed, Mar 19, 2008 at 10:03 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:

>
> Shai Erera wrote:
> > I think you misunderstood me - ultimately, the process reached 128MB.
> > However it was flushing the .fdt file before it reached that. Your
> > explanation on stored fields explains that behavior, but it did
> > consume128MB.
>
> Ahh, phew.
>
> > Also, the CFS files that were written were of size >200MB (but less
> > than
> > 256) - which does not align with the 128MB setting. But I'm sure
> > there's a
> > good explanation to that as well :-)
>
> Yes: the fdt/fdx (and term vectors if you had used them) are included
> in that CFS file.  Though, due to inefficiency of RAM usage I'd
> expect all non-stored-field files in a segment to be maybe 64 MB
> (assuming 50% RAM efficiency).  This means you have really really big
> stored fields.  Does that sound right?
>
> > As for the RAMDirectory usage, I would think that if Lucene would
> > store a
> > true directory in-memory, with segments information and all,
> > writing that to
> > the file system would be as efficient as flushing big chunks of byte
> > [], not
> > having to process the postings and flush them (god forbid) one posting
> > element at a time.
>
> Not necessarily.  By inserting an intermediate RAMDirectory in
> DocumentsWriter we could get better net RAM efficiency, at hopefully
> not too much added time cost, than what we have now, as measured by
> "size of what's flushed to the filesystem divided by RAM buffer
> size", I think.  Really it needs testing.  DocumentsWriter is forced
> to waste some space (much less than before) in order to quickly
> update posting lists... so this tradeoff of "flush frequently & merge
> them in RAM" vs "only flush to the filesystem when all RAM buffer is
> full", may be worthwhile.  (We only do the latter today).
>
> > The reason I'm worried about the performance of RAM vs.
> > maxBufferredDocs
> > (MBD) is that I was hoping that with Lucene 2.3, if I have a
> > machine with
> > 4GB of RAM available for indexing, I'll be able to utilize it. But
> > according
> > my small test, setting RAM to 128 or MBD to 10,000 (which consumed
> > around 70
> > MB) gave the same performance. So I find myself asking whether
> > flush by RAM
> > usage is more useful than by MBD (as the documentation states).
>
> I think this is just because performance levels off?  Ie, if you set
> your RAM buffer size to 70 MB you should see about the same
> performance as well?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Regards,

Shai Erera

Reply via email to