Besides the content field, everything is stored, so that may explain the large CFS files.
Regarding the RAM-usage performance, I tried setting to 128, 256 and 512, all gave the same time measurements (give or take ~5%) as the MBD (set to 10,000) run. I think it needs further investigation. Was it tested before? I mean - has someone tried to set RAM to 2GB for example and noticed a major performance improvement (as I'd expect)? On Wed, Mar 19, 2008 at 10:03 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Shai Erera wrote: > > I think you misunderstood me - ultimately, the process reached 128MB. > > However it was flushing the .fdt file before it reached that. Your > > explanation on stored fields explains that behavior, but it did > > consume128MB. > > Ahh, phew. > > > Also, the CFS files that were written were of size >200MB (but less > > than > > 256) - which does not align with the 128MB setting. But I'm sure > > there's a > > good explanation to that as well :-) > > Yes: the fdt/fdx (and term vectors if you had used them) are included > in that CFS file. Though, due to inefficiency of RAM usage I'd > expect all non-stored-field files in a segment to be maybe 64 MB > (assuming 50% RAM efficiency). This means you have really really big > stored fields. Does that sound right? > > > As for the RAMDirectory usage, I would think that if Lucene would > > store a > > true directory in-memory, with segments information and all, > > writing that to > > the file system would be as efficient as flushing big chunks of byte > > [], not > > having to process the postings and flush them (god forbid) one posting > > element at a time. > > Not necessarily. By inserting an intermediate RAMDirectory in > DocumentsWriter we could get better net RAM efficiency, at hopefully > not too much added time cost, than what we have now, as measured by > "size of what's flushed to the filesystem divided by RAM buffer > size", I think. Really it needs testing. DocumentsWriter is forced > to waste some space (much less than before) in order to quickly > update posting lists... so this tradeoff of "flush frequently & merge > them in RAM" vs "only flush to the filesystem when all RAM buffer is > full", may be worthwhile. (We only do the latter today). > > > The reason I'm worried about the performance of RAM vs. > > maxBufferredDocs > > (MBD) is that I was hoping that with Lucene 2.3, if I have a > > machine with > > 4GB of RAM available for indexing, I'll be able to utilize it. But > > according > > my small test, setting RAM to 128 or MBD to 10,000 (which consumed > > around 70 > > MB) gave the same performance. So I find myself asking whether > > flush by RAM > > usage is more useful than by MBD (as the documentation states). > > I think this is just because performance levels off? Ie, if you set > your RAM buffer size to 70 MB you should see about the same > performance as well? > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Regards, Shai Erera