[
https://issues.apache.org/jira/browse/LUCENE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501569#comment-13501569
]
Robert Muir commented on LUCENE-4547:
-------------------------------------
{quote}
a single boolean is too complicated?
{quote}
I think it is, I feel like it really confuses the API and makes writing codecs
harder.
I think it would be better if the codec impl determined this, just like
MemoryPostings and so on.
So I'd rather have Per-field dv wrapper that configures this.
For example someone would use a different implementation for their solr
__version field than they
would use for a scoring factor, and maybe a different implementation for a sort
field than a faceting one.
I don't think there is a use case to be able to access a single field's values
both from RAM and on disk,
and for the codec to have to deal with that. It makes things currently very
complicated.
{quote}
We had this in 4.0 and I think we should make this work in 4.1 too.
{quote}
I don't think thats necessarily true. In 4.0 the one DV impl we had could do a
lot, but the codec API is
very difficult. I actually contributed to a lot of the codec apis in Lucene,
and as a committer I was unable
to figure out how to write a working DV impl to this api. I think this says a
lot.
I'd rather have a simpler codec API, that enables innovation so that we can see
cool shit in the future,
like implementations geared at sorting and faceting that use less RAM, and so
on.
If someone really needs more fine-grained control than per-field codec API,
then there are other ways to achieve
that: FileSwitchDirectory, adding such APIs to their own codec, etc. But I'm
not sure its mainstream and should
be required by all codecs.
> DocValues field broken on large indexes
> ---------------------------------------
>
> Key: LUCENE-4547
> URL: https://issues.apache.org/jira/browse/LUCENE-4547
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Priority: Blocker
> Fix For: 4.1
>
> Attachments: test.patch
>
>
> I tried to write a test to sanity check LUCENE-4536 (first running against
> svn revision 1406416, before the change).
> But i found docvalues is already broken here for large indexes that have a
> PackedLongDocValues field:
> {code}
> final int numDocs = 500000000;
> for (int i = 0; i < numDocs; ++i) {
> if (i == 0) {
> field.setLongValue(0L); // force > 32bit deltas
> } else {
> field.setLongValue(1<<33L);
> }
> w.addDocument(doc);
> }
> w.forceMerge(1);
> w.close();
> dir.close(); // checkindex
> {code}
> {noformat}
> [junit4:junit4] 2> WARNING: Uncaught exception in thread: Thread[Lucene
> Merge Thread #0,6,TGRP-Test2GBDocValues]
> [junit4:junit4] 2> org.apache.lucene.index.MergePolicy$MergeException:
> java.lang.ArrayIndexOutOfBoundsException: -65536
> [junit4:junit4] 2> at
> __randomizedtesting.SeedInfo.seed([5DC54DB14FA5979]:0)
> [junit4:junit4] 2> at
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:535)
> [junit4:junit4] 2> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:508)
> [junit4:junit4] 2> Caused by: java.lang.ArrayIndexOutOfBoundsException:
> -65536
> [junit4:junit4] 2> at
> org.apache.lucene.util.ByteBlockPool.deref(ByteBlockPool.java:305)
> [junit4:junit4] 2> at
> org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$FixedBytesWriterBase.set(FixedStraightBytesImpl.java:115)
> [junit4:junit4] 2> at
> org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.writePackedInts(PackedIntValues.java:109)
> [junit4:junit4] 2> at
> org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.finish(PackedIntValues.java:80)
> [junit4:junit4] 2> at
> org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:130)
> [junit4:junit4] 2> at
> org.apache.lucene.codecs.PerDocConsumer.merge(PerDocConsumer.java:65)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]