[ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3659:
----------------------------------

    Attachment: LUCENE-3659.patch

I started to work on this, here is just a first step (trunk). This patch 
removes the BUFFER_SIZE constant and moves it up to RAMDirectory (but for now 
only as default, see below!). RAMDirectory inherits the default buffersize for 
now to its RAMFile childs (newRAMFile() method), but this can likely change 
(see below).

As every RAMFile has its own buffer size, optimizations are possible:
- when you open an IndexOutput, in trunk we get the IOContext, which may 
contain a Merge/Flush desc containing the complete segment size (unfortunately 
the *complete* segment size). But this number can be used as a order of 
magnitude for specifiing the buffer size.

The patch does not yet implement that, but an idea would be to maybe allocate 
1/32 of the segment size as buffer size. By that the buffer size does not get 
too big, but on the other hand the number of slices has an upper limit (approx 
32 slices per merged segment). Currently a merged segment with a size of say 32 
Gigabytes would have 32 million byte[] arrays, after the change only 32 byte[] 
arrays with a size of 1 Gigabyte each. This should make GC happy.

When backporting to 3.x, the IOContext is not yet available and RAMDirectory 
always uses the default buffer size (maybe randomize in tests). Rainsing the 
buffer size should bring improvements here.

We should still add some warnings into the Javadocs, that for *large* indexes 
it is often preferable to use MMapDir, especially when you store it on disk. We 
should also peple tell that new RAMDirectoty(OtherDirectory) maybe a bad idea...

The new default buffer size was raised from 1024 to 8192.
                
> Improve Javadocs of RAMDirectory to document its limitations
> ------------------------------------------------------------
>
>                 Key: LUCENE-3659
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3659
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.5, 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3659.patch
>
>
> Spinoff from several dev@lao issues:
> - 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
> - issue LUCENE-3653
> The use cases for RAMDirectory are very limited and to prevent users from 
> using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
> improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to