[ http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12451433 ] Yonik Seeley commented on LUCENE-709: -------------------------------------
> 1. Re. Yonik's comment about my synchronization scenario. Synhronizing as > described does resolve the issue. In your merging documents scenario, you state "Thread 1 adds a new document, creating a new segment with new index files, leading to segment merging, that creates new larger segment index files, and then deletes all replaced segment index files." If a different thread calls getSizeInBytes() after the merge but before the deletes, you will see both the old segments and new segments created by the merge and will be double counting. Synchronizing the directory-level getSizeInBytes() will not solve that... it requires higher level synchronization. Anyway, I think the point is moot as I think we should handle the size incrementally. >Counting buffer sizes rather than file length may be slightly more accurate, >but at least for me it is not material. It could be *much* more accurate though. All buffering of documents in IndexWriter is done with single doc segments. That 1 byte norm file takes up 1024 bytes of buffer space! > I think my test case should still be used +1, I didn't do any testing :-) BTW, some of the synchronization bugs were fixed in the recent lockless patch. > [PATCH] Enable application-level management of IndexWriter.ramDirectory size > ---------------------------------------------------------------------------- > > Key: LUCENE-709 > URL: http://issues.apache.org/jira/browse/LUCENE-709 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.0.1 > Environment: All > Reporter: Chuck Williams > Attachments: ramdir.patch, ramdir.patch, ramDirSizeManagement.patch, > ramDirSizeManagement.patch, ramDirSizeManagement.patch > > > IndexWriter currently only supports bounding of in the in-memory index cache > using maxBufferedDocs, which limits it to a fixed number of documents. When > document sizes vary substantially, especially when documents cannot be > truncated, this leads either to inefficiencies from a too-small value or > OutOfMemoryErrors from a too large value. > This simple patch exposes IndexWriter.flushRamSegments(), and provides access > to size information about IndexWriter.ramDirectory so that an application can > manage this based on total number of bytes consumed by the in-memory cache, > thereby allow a larger number of smaller documents or a smaller number of > larger documents. This can lead to much better performance while elimianting > the possibility of OutOfMemoryErrors. > The actual job of managing to a size constraint, or any other constraint, is > left up the applicatation. > The addition of synchronized to flushRamSegments() is only for safety of an > external call. It has no significant effect on internal calls since they all > come from a sychronized caller. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]