Re: Determining NumericType for a field

Michael McCandless Mon, 15 Dec 2014 02:36:04 -0800

On Mon, Dec 15, 2014 at 4:53 AM, Toke Eskildsen <[email protected]> 
wrote:


>> In the meantime, maybe you could model your tool after
>> UninvertingReader?  It faces the same issue (lack of schema) and lets
>> the user specify the type.
>
> Yes, that is what we're doing. Unfortunately we cannot use the
> UninvertingReader directly due to its restrictions on facet structure
> size: We have too many references in our shards so it hits an internal
> 16M(?) limit.

Hmm that's probably the DocTermOrds 16 MB internal addressing limit?

> Unfortunately our current mapping code from stored multi value String to
> DocValues seems to be much very slow: It took nearly 2 days to convert a
> single-segment 900GB index, where a standard optimize is only 8 hours.

That's awful.  Profile it?  But, how long did it take to index in the
first place?

>> Also, see (the confusingly named) TestDemoParallelLeafReader?  It lets
>> you partially reindex, e.g. derive new indexed fields or DV fields,
>> etc., from existing stored/DV fields, in an NRT manner.
>
> Thanks for the pointer. As far as I can see, the demo is very explicit
> about the type of DocValues being long, so no auto-guessing there. It's
> a very interesting idea though, with seamless DV-enabling.

The DVs can be arbitrary (not just long); it's only that the test
cases focuses on long.

Have a look @ the LUCENE-6005 branch: I broke this test out as a
separate ReindexingReader + test.  I think we could do a better
integration between that and the schema...

I also added a simpler "testSwitchToDocValues" test case.  It still
uses only long DVs but you can easily see how you could do other types
to ... I'll add an example of SortedSet.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Determining NumericType for a field

Reply via email to