[jira] Commented: (LUCENE-709) [PATCH] Enable application-level management of IndexWriter.ramDirectory size

Yonik Seeley (JIRA) Fri, 10 Nov 2006 06:49:00 -0800

    [ 
http://issues.apache.org/jira/browse/LUCENE-709?page=comments#action_12448758 ] 
            
Yonik Seeley commented on LUCENE-709:
-------------------------------------


> That code too was without the thread-safety measure Yonik suggests so I don't 
> know what overhead that will add.

switching to an enumerator should be negligibly faster since Hashtable's 
iterator is implemented as it's enumerator plus  extra concurrent modification 
checks.  That might not be sufficient for total thread safety though.

enumerating through the Hashtable while not synchronized means you can 
encounter an object that was just added by another thread.  The other thread 
synchronized while adding the new object, but the thread enumerating didn't 
execute a read barrier.  The new memory model provides "out-of-thin-air safety" 
and "initialization safety" guarantees.  Thus, we are guaranteed to see a 
complete instance of RAMFile (just not necessarily current).  In this specific 
usecase, I think it boils down to if updating the long length is atomic, which 
we can't guarantee for all platforms.  Your count could be off by 4GB if you 
"see" the bottom 32 bits before the top.

In this IndexWriter usecase, we should never see a long length that uses both 
32 bit words, because we are talking about single segments though.

Bottom line (I think):  If you want getSizeBytes to work correctly 100% of the 
time in *all* instances and platforms, you need to synchronize it (and hence 
block any gets/puts during that time.... blech)



> [PATCH] Enable application-level management of IndexWriter.ramDirectory size
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-709
>                 URL: http://issues.apache.org/jira/browse/LUCENE-709
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.0.1
>         Environment: All
>            Reporter: Chuck Williams
>         Attachments: ramDirSizeManagement.patch
>
>
> IndexWriter currently only supports bounding of in the in-memory index cache 
> using maxBufferedDocs, which limits it to a fixed number of documents.  When 
> document sizes vary substantially, especially when documents cannot be 
> truncated, this leads either to inefficiencies from a too-small value or 
> OutOfMemoryErrors from a too large value.
> This simple patch exposes IndexWriter.flushRamSegments(), and provides access 
> to size information about IndexWriter.ramDirectory so that an application can 
> manage this based on total number of bytes consumed by the in-memory cache, 
> thereby allow a larger number of smaller documents or a smaller number of 
> larger documents.  This can lead to much better performance while elimianting 
> the possibility of OutOfMemoryErrors.
> The actual job of managing to a size constraint, or any other constraint, is 
> left up the applicatation.
> The addition of synchronized to flushRamSegments() is only for safety of an 
> external call.  It has no significant effect on internal calls since they all 
> come from a sychronized caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-709) [PATCH] Enable application-level management of IndexWriter.ramDirectory size

Reply via email to