Re: Iterating BinaryDocValues

Robert Muir Fri, 10 Jan 2014 12:15:31 -0800

I made an issue: https://issues.apache.org/jira/browse/LUCENE-5393



On Fri, Jan 10, 2014 at 2:54 PM, Robert Muir <[email protected]> wrote:

> Yeah, i dont think its from newer docvalues-using code like yours shai.
>
> instead the problems i had doing this are historical, because e.g.
> fieldcache pointed to large arrays and consumers were lazy about it,
> knowing that there reference pointed to bytes that would remain valid
> across invocations.
>
> we just have to remove these assumptions. I don't apologize for not doing
> this, as you show, its some small % improvement (which we should go and get
> back!), but i went with safety first initially rather than bugs.
>
>
>
> On Fri, Jan 10, 2014 at 2:50 PM, Shai Erera <[email protected]> wrote:
>
>> I agree with Robert. We should leave cloning BytesRefs to whoever needs
>> that, and not penalize everyone else who don't need it. I must say I didn't
>> know I can "own" those BytesRefs and I clone them whenever I need to. I
>> think I was bitten by one of the other APIs, so I assumed returned
>> BytesRefs are not "mine" across all the APIs.
>>
>> Shai
>>
>>
>> On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <[email protected]> wrote:
>>
>>> the problem is really simpler to solve actually.
>>>
>>> Look at the comments in the code, it tells you why it is this way:
>>>
>>>           // NOTE: we could have one buffer, but various consumers (e.g.
>>> FieldComparatorSource)
>>>           // assume "they" own the bytes after calling this!
>>>
>>> That is what we should fix. There is no need to make bulk APIs or even
>>> change the public api in any way (other than javadocs).
>>>
>>> We just move the clone'ing out of the codec, and require the consumer to
>>> do it, same as termsenum or other apis. The codec part is extremely simple
>>> here, its even the way i had it initially.
>>>
>>> But at the time (and even still now) this comes with some risk of bugs.
>>> So initially I removed the reuse and went with a more conservative approach
>>> to start with.
>>>
>>>
>>> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev <
>>> [email protected]> wrote:
>>>
>>>> Adrian,
>>>>
>>>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses
>>>> ByteRef that provides 10% gain.
>>>> ...
>>>> bulkGet took:101630 ms
>>>> ...
>>>> get took:114422 ms
>>>>
>>>>
>>>>
>>>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <[email protected]>wrote:
>>>>
>>>>> I don't think we should add such a method. Doc values are commonly
>>>>> read from collectors, so why do we need a method that works on top of
>>>>> a DocIdSetIterator? I'm also curious how specialized implementations
>>>>> could make this method faster than the default implementation?
>>>>>
>>>>> --
>>>>> Adrien
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>>
>>>> <http://www.griddynamics.com>
>>>>  <[email protected]>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>
>>>
>>
>

Re: Iterating BinaryDocValues

Reply via email to