Hi Mike, So, when loading the results I want to return (say 10 documents), if not all docs fit in RAM, I would incur up to 10 individual disk seek operations. Which will kill my performance. Is that correct?
Considering what are my alternatives: 1. Create another separate lean index that would fit in RAM. 2. Keep stored fields to a minimum, store non frequent accessed store fields outside of Lucene. In this particular use case, it would have really helped if I could order Lucene which stored fields should be eagerly read loading a document, and which should be lazy loaded from else where in the disk. Thereby fitting into memory those stored fields that are frequently needed. I guess my use case is too specific? Gili. On Wed, Jan 23, 2013 at 8:59 AM, Michael McCandless <luc...@mikemccandless.com> wrote: Are the additional rarely used 48 fields used for searching? Or, for looking up stored fields? If it's for searching then you should see good locality (efficient use of the OS's IO cache) from the posting lists: each field's postings are stored in a single chunk of the files, then the next field's postings, etc. Ie the storage is "column stride" (if columns are fields and rows are documents). But for stored fields, or term vectors, which are "row stride", you won't see efficient use of the OS's IO cache. Mike McCandless http://blog.mikemccandless.com On Wed, Jan 23, 2013 at 7:59 AM, Gili Nachum <gil...@il.ibm.com> wrote: Hi, I have a search workload that focuses on two fields in my 1GB index. I get very good performance when loaded the index via MmapDirectory. I attribute this performance to the Operating System File System (FS OS) cache, that keeps the most recently used FS blocks RAM resident. I would like to add 50 more fields to the index, increasing it size to ~50GB, A key factor is that these additional fields will be queried very rarely. Given this increase in index size, should I expect lower Queries/Sec rate for the original search workload (that doesn't use the new fields)? I would assume that if the values of each searchable field are stored in a different set of FS blocks, then the 50 additional fields would make no difference for the OS FS cache, as it would continue to behave like before, keeping in RAM those most used FS blocks. On the other hand, if values from different fields share the same FS blocks, then the hot 2 fields values will be to scattered acrossed the FS the OS cache useless. degradating performance back to I/O bounded. Which is the case with Lucene 3.6? Thanks. Gili Nachum. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org