[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238402#comment-13238402 ]
Michael McCandless commented on LUCENE-3659: -------------------------------------------- This looks great Uwe! I'm a little worried about the tiny file case; you're checking for SEGMENTS_* now, but many other files can be much smaller than 1/64th of the estimated segment size. I wonder if we should "improve" IOContext to hold the [rough] estimated file size (not just overall segment size)... the thing is that's sort of a hassle on codec impls. Or: maybe, on closing the ROS/RAMFile, we can downsize the final buffer (yes, this means copying the bytes, but that cost is vanishingly small as the RAMDir grows). Then tiny files stay tiny, though they are still [relatively] costly to create... I don't this RAMDir.createOutput should publish the RAMFile until the ROS is closed? Ie, you are not allowed to openInput on something still opened with createOutput in any Lucene Dir impl..? This would allow us to make RAMFile frozen (eg if ROS holds its own buffers and then creates RAMFile on close), that requires no sync when reading? I also don't think RAMFile should be public, ie, the only way to make changes to a file stored in a RAMDir is via RAMOutputStream. We can do this separately... Maybe we should pursue a growing buffer size...? Ie, where each newly added buffer is bigger than the one before (like ArrayUtil.oversize's growth function)... I realize that adds complexity (RAMInputStream.seek is more fun), but this would let tiny files use tiny RAM and huge files use few buffers. Ie, RAMDir would scale up and scale down well. Separately: I noticed we still have IndexOutput.setLength, but, nobody calls it anymore I think? (In 3.x we call this when creating a CFS). Maybe we should remove it... > Improve Javadocs of RAMDirectory to document its limitations and add > improvements to make it more GC friendly on large indexes > ------------------------------------------------------------------------------------------------------------------------------ > > Key: LUCENE-3659 > URL: https://issues.apache.org/jira/browse/LUCENE-3659 > Project: Lucene - Java > Issue Type: Task > Affects Versions: 3.5, 4.0 > Reporter: Uwe Schindler > Assignee: Uwe Schindler > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch > > > Spinoff from several dev@lao issues: > - > [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] > - issue LUCENE-3653 > The use cases for RAMDirectory are very limited and to prevent users from > using it for e.g. loading a 50 Gigabyte index from a file on disk, we should > improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org