[ 
http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12450260 ] 
            
Chuck Williams commented on LUCENE-709:
---------------------------------------

Not synchronizing on the Hashtable, even if using an Enumerator, creates 
problems as the contents of the hash table may change during the sizeInBytes() 
iteration.  Files might be deleted and/or added to the directory concurrently, 
causing the size to be computed from an invalid intermediate state.  Using an 
Enumerator would cause the invalid value to be returned without an exception, 
while using an Iterator instead generates a ConcurrentModificationException.  
Synchronizing on files avoids the problem altogether without much cost as the 
loop is fast.

Hashtable uses a single class, Hashtable.Enumerator, for both its iterator and 
its enumerator.  There are a couple minor differences in the respective 
methods, such as the above, but not much.

The issue with RAMFile.length being a long is an issue, but, this bug already 
exists in Lucene without sizeInBytes().  See RAMDirectory.fileLength(), which 
has the same problem now.

I'll submit another verison of the patch that encapsulates RAMFile.length into 
a sychronized getter and setter.  It's only used in a few places (RAMDIrectory, 
RAMInputStream and RAMOutputStream).


> [PATCH] Enable application-level management of IndexWriter.ramDirectory size
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-709
>                 URL: http://issues.apache.org/jira/browse/LUCENE-709
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.0.1
>         Environment: All
>            Reporter: Chuck Williams
>         Attachments: ramDirSizeManagement.patch, ramDirSizeManagement.patch
>
>
> IndexWriter currently only supports bounding of in the in-memory index cache 
> using maxBufferedDocs, which limits it to a fixed number of documents.  When 
> document sizes vary substantially, especially when documents cannot be 
> truncated, this leads either to inefficiencies from a too-small value or 
> OutOfMemoryErrors from a too large value.
> This simple patch exposes IndexWriter.flushRamSegments(), and provides access 
> to size information about IndexWriter.ramDirectory so that an application can 
> manage this based on total number of bytes consumed by the in-memory cache, 
> thereby allow a larger number of smaller documents or a smaller number of 
> larger documents.  This can lead to much better performance while elimianting 
> the possibility of OutOfMemoryErrors.
> The actual job of managing to a size constraint, or any other constraint, is 
> left up the applicatation.
> The addition of synchronized to flushRamSegments() is only for safety of an 
> external call.  It has no significant effect on internal calls since they all 
> come from a sychronized caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to