On Mon, Dec 15, 2014 at 4:53 AM, Toke Eskildsen <[email protected]> wrote:
>> In the meantime, maybe you could model your tool after >> UninvertingReader? It faces the same issue (lack of schema) and lets >> the user specify the type. > > Yes, that is what we're doing. Unfortunately we cannot use the > UninvertingReader directly due to its restrictions on facet structure > size: We have too many references in our shards so it hits an internal > 16M(?) limit. Hmm that's probably the DocTermOrds 16 MB internal addressing limit? > Unfortunately our current mapping code from stored multi value String to > DocValues seems to be much very slow: It took nearly 2 days to convert a > single-segment 900GB index, where a standard optimize is only 8 hours. That's awful. Profile it? But, how long did it take to index in the first place? >> Also, see (the confusingly named) TestDemoParallelLeafReader? It lets >> you partially reindex, e.g. derive new indexed fields or DV fields, >> etc., from existing stored/DV fields, in an NRT manner. > > Thanks for the pointer. As far as I can see, the demo is very explicit > about the type of DocValues being long, so no auto-guessing there. It's > a very interesting idea though, with seamless DV-enabling. The DVs can be arbitrary (not just long); it's only that the test cases focuses on long. Have a look @ the LUCENE-6005 branch: I broke this test out as a separate ReindexingReader + test. I think we could do a better integration between that and the schema... I also added a simpler "testSwitchToDocValues" test case. It still uses only long DVs but you can easily see how you could do other types to ... I'll add an example of SortedSet. Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
