I made an issue: https://issues.apache.org/jira/browse/LUCENE-5393
On Fri, Jan 10, 2014 at 2:54 PM, Robert Muir <[email protected]> wrote: > Yeah, i dont think its from newer docvalues-using code like yours shai. > > instead the problems i had doing this are historical, because e.g. > fieldcache pointed to large arrays and consumers were lazy about it, > knowing that there reference pointed to bytes that would remain valid > across invocations. > > we just have to remove these assumptions. I don't apologize for not doing > this, as you show, its some small % improvement (which we should go and get > back!), but i went with safety first initially rather than bugs. > > > > On Fri, Jan 10, 2014 at 2:50 PM, Shai Erera <[email protected]> wrote: > >> I agree with Robert. We should leave cloning BytesRefs to whoever needs >> that, and not penalize everyone else who don't need it. I must say I didn't >> know I can "own" those BytesRefs and I clone them whenever I need to. I >> think I was bitten by one of the other APIs, so I assumed returned >> BytesRefs are not "mine" across all the APIs. >> >> Shai >> >> >> On Fri, Jan 10, 2014 at 9:46 PM, Robert Muir <[email protected]> wrote: >> >>> the problem is really simpler to solve actually. >>> >>> Look at the comments in the code, it tells you why it is this way: >>> >>> // NOTE: we could have one buffer, but various consumers (e.g. >>> FieldComparatorSource) >>> // assume "they" own the bytes after calling this! >>> >>> That is what we should fix. There is no need to make bulk APIs or even >>> change the public api in any way (other than javadocs). >>> >>> We just move the clone'ing out of the codec, and require the consumer to >>> do it, same as termsenum or other apis. The codec part is extremely simple >>> here, its even the way i had it initially. >>> >>> But at the time (and even still now) this comes with some risk of bugs. >>> So initially I removed the reuse and went with a more conservative approach >>> to start with. >>> >>> >>> On Fri, Jan 10, 2014 at 2:36 PM, Mikhail Khludnev < >>> [email protected]> wrote: >>> >>>> Adrian, >>>> >>>> Please find bulkGet() scratch. It's ugly copy-paste, just reuses >>>> ByteRef that provides 10% gain. >>>> ... >>>> bulkGet took:101630 ms >>>> ... >>>> get took:114422 ms >>>> >>>> >>>> >>>> On Fri, Jan 10, 2014 at 8:58 PM, Adrien Grand <[email protected]>wrote: >>>> >>>>> I don't think we should add such a method. Doc values are commonly >>>>> read from collectors, so why do we need a method that works on top of >>>>> a DocIdSetIterator? I'm also curious how specialized implementations >>>>> could make this method faster than the default implementation? >>>>> >>>>> -- >>>>> Adrien >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>>> >>>> >>>> >>>> -- >>>> Sincerely yours >>>> Mikhail Khludnev >>>> Principal Engineer, >>>> Grid Dynamics >>>> >>>> <http://www.griddynamics.com> >>>> <[email protected]> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>> >>> >> >
