Fair enough, let's resolve it in a way that makes everyone happy. Not being
able to use DV on-disk "officially" seems like a drawback to me.

I still prefer that we support both in-memory and on-disk and even default
to in-memory, because Lucene should have great performance out of the box,
and these days RAM is not so much an issue. Perhaps we should explore
writing DirectDVFormat which keeps everything in memory outside the heap
(this is for a separate issue).

I get what you're saying about supporting formats, but I think that if
on-disk vs in-memory (for any format) means "same representation on disk,
different behavior at runtime", we should be able to support both variants?
That way, in-memory is just an optimized implementation detail, and is not
strictly a "Format".

Shai


On Fri, Jul 19, 2013 at 4:17 PM, Robert Muir <[email protected]> wrote:

>
> On Fri, Jul 19, 2013 at 9:05 AM, Shai Erera <[email protected]> wrote:
>
>> Hi
>>
>> Following the discussion on LUCENE-5121 I want to propose that we make
>> DiskDVFormat not experimental. The reason is that the way things are now,
>> we're basically telling everyone that if you're using DV, you have to have
>> all of them in-memory. If you can't, you're on your own -- either use the
>> experimental DiskDV which we don't guarantee backwards support for, or
>> write your own DVFormat, which is uber expert.
>>
>> I'm not sure if it's good that we force high-memory consumption when
>> using Lucene. We don't enforce that in other places (e.g. users can tweak
>> IW.ramBuffer, termIndexInterval etc.), and DV should be no exception,
>> especially as it will likely be big.
>>
>> I don't advocate for making DiskDV the default, just to allow a supported
>> disk-based one. Some apps may not be able to load DV entirely into memory,
>> and the alternatives aren't great IMO. I guess I see in-memory DVFormat as
>> an optimization of DiskDV, and not another way to encode DVs (as opposed to
>> custom PostingsFormats).
>>
>> What would it take to make it not experimental? Is it just the removal of
>> @lucene.experimental or do we need to name it otherwise? Fix outstanding
>> issues?
>>
>
> I don't really agree with the rationale. I would like to take something
> like DiskDV and make it the default implementation, and then name the
> current one "Memory".
>
> But i am still not happy to do this when we have algorithms that do things
> like tons of ordinal-term lookups (really not using the datastructure
> correctly). I guess these are slow with the in-memory one too since they do
> lots of FST binary searches.
>
> Bottom line, i dont think we should provide index back compat for a
> variety of formats. I think we should have one official format which we
> generate the backwards indexes for and so on. Its hard enough to support
> even one format for the long long period of time (e.g. thru 5.9) that we
> support.
>
> So i'd rather see an issue to "not make DV use so much RAM by default".
> And we could do this for the 4.5 format.
>
>

Reply via email to