Robert,

OK, I didn't realize that Lucene45 was mostly a disk based format. Thanks
for the clear explanation and links.

Joel

Joel Bernstein
Search Engineer at Heliosearch


On Sat, Feb 1, 2014 at 8:25 PM, Robert Muir <[email protected]> wrote:

>
> It is just as I described it. Lucene45 is not an in-memory format. it only
> holds 2 datastructures in ram (which only apply to certain cases of
> docvalues instances: sorted, sortedset, and variable-length binary). You
> can see this, if you look at the producer's source code (the javadoc of
> *Format also describes in great detail what these datastructures are and
> how they are used):
>
>   // memory-resident structures
>   private final Map<Integer,MonotonicBlockPackedReader> addressInstances = 
> new HashMap<Integer,MonotonicBlockPackedReader>();
>   private final Map<Integer,MonotonicBlockPackedReader> ordIndexInstances = 
> new HashMap<Integer,MonotonicBlockPackedReader>();
>
>
> http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java
>
> the "disk" codec simply overrides these 2 datastructures to not be in RAM
> either. This is extreme, as they should be very small. You really should
> not use this unless you have a 486 or something like that.
>
>
> http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/codecs/src/java/org/apache/lucene/codecs/diskdv/DiskDocValuesProducer.java
>
> all-in-memory is in the memory/ package:
>
>
> http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/codecs/src/java/org/apache/lucene/codecs/memory/MemoryDocValuesFormat.java
>
> direct is like memory, except applies no compression at all:
>
>
> http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/codecs/src/java/org/apache/lucene/codecs/memory/DirectDocValuesFormat.java
>
> On Sat, Feb 1, 2014 at 5:10 PM, Joel Bernstein <[email protected]> wrote:
>
>> Robert,
>>
>> Unless I'm missing something the default docvalues format appears to
>> be Lucene45 in (Solr 4.6). Is this the "Memory" format you mention, or is
>> there another "Memory" docvalues format? I'm confused because I thought the
>> Disk format kept certain things on disk and certain things in memory, but
>> this does not appear to be the default format.
>>
>> Thanks,
>> Joel
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>  Joel Bernstein
>> Search Engineer at Heliosearch
>>
>>
>> On Sat, Feb 1, 2014 at 12:19 PM, Tom Burton-West <[email protected]>wrote:
>>
>>> Thanks Shawn, Joel, and Robert,
>>>
>>> Shawn, thanks for mentioning the caveat of having to re-index when
>>> upgrading Solr.  We almost always re-index when we upgrade Solr.
>>>
>>>
>>> >>There is a ton of misinformation in this thread.
>>> I think this might be because the DocValues implementation is a moving
>>> target, and that the documentation has not kept up.
>>>
>>> >>As of lucene 4.5, the default docvalues are disk-based >>(mostly, some
>>> small stuff in ram).
>>> >>You probably don't need to change anything from the defaults, unless:
>>>
>>> >>if you want everything in RAM, use Memory.
>>> >>If you want to waste RAM, use Direct.
>>> >>If you have no RAM, use Disk.
>>>
>>> Should I try to edit the Solr wiki (which talks about 4.2 and says the
>>> default is to put everything in memory)  or is the idea that the cwiki is
>>> where people should look for current documentation?
>>> One of the things that confused me was that the cwiki pointed to the
>>> outdated Solr wiki entry on DocValues.
>>>
>>> I think I understand the use cases where someone would want everything
>>> in RAM or everything on Disk.  I'm assuming that the default (4.5) makes
>>> some trade-off by putting some important data structures in RAM.
>>>
>>> Where should I look (maybe a JIRA issue?) to understand the use case for
>>> Direct?   Maybe adding a sentence to the JavaDoc for Direct explaining why
>>> someone would want to use it would be useful.
>>>
>>> p.s. Robert, I saw your edits on the cwiki and I really appreciate that
>>> with all the time you spend working on code, that you take the time to help
>>> with the docs.
>>>
>>>
>>> Tom
>>>
>>>
>>>
>>>
>>>
>>
>

Reply via email to