[ http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12450269 ] Yonik Seeley commented on LUCENE-709: -------------------------------------
> the contents of the hash table may change during the sizeInBytes() iteration. Yes, but that's OK. > Files might be deleted and/or added to the directory concurrently, causing > the size to be computed from an invalid intermediate state Synchronizing at that low level doesn't make the computed size more valid though... you need synchronization at a higher level if you want to say more about what the size you are computing represents. Consider the case of two different uncoordinated threads... one adding a new file to the RAMDirectory, and the other calculating the size of the directory(). In the unsynchronized case, you don't know if the size will include the new file or not. If sizeInBytes() is synchronized, you still don't know which thread will acquire the lock first, so you still don't know if the size will include the new file. Synchronizing sizeInBytes() does nothing but add a bottleneck. > Synchronizing on files avoids the problem altogether without much cost as the > loop is fast. I disagree that the loop will be fast... simpler loops have proven to take some time: LUCENE-388: Improve indexing performance when maxBufferedDocs is large by keeping a count of buffered documents rather than counting after each document addition. That was just counting the documents, not the number of files in each segment (which will be larger). Consider maxBufferedDocs of 1000 to 10000 with 10 or 20 indexed fields, and you end up with 17000 to 270000 files to calculate the size over. > [PATCH] Enable application-level management of IndexWriter.ramDirectory size > ---------------------------------------------------------------------------- > > Key: LUCENE-709 > URL: http://issues.apache.org/jira/browse/LUCENE-709 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.0.1 > Environment: All > Reporter: Chuck Williams > Attachments: ramDirSizeManagement.patch, ramDirSizeManagement.patch > > > IndexWriter currently only supports bounding of in the in-memory index cache > using maxBufferedDocs, which limits it to a fixed number of documents. When > document sizes vary substantially, especially when documents cannot be > truncated, this leads either to inefficiencies from a too-small value or > OutOfMemoryErrors from a too large value. > This simple patch exposes IndexWriter.flushRamSegments(), and provides access > to size information about IndexWriter.ramDirectory so that an application can > manage this based on total number of bytes consumed by the in-memory cache, > thereby allow a larger number of smaller documents or a smaller number of > larger documents. This can lead to much better performance while elimianting > the possibility of OutOfMemoryErrors. > The actual job of managing to a size constraint, or any other constraint, is > left up the applicatation. > The addition of synchronized to flushRamSegments() is only for safety of an > external call. It has no significant effect on internal calls since they all > come from a sychronized caller. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]