[
https://issues.apache.org/jira/browse/LUCENE-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572163#comment-15572163
]
Michael McCandless commented on LUCENE-7489:
--------------------------------------------
+1, this patch looks wonderful!
It looks like it uses the same compression techniques for the values as the 6.x
codec, but then for "which docIDs have a value" it has three different
approaches, for the very sparse, mostly dense, and 100% dense cases.
I hit this test failure, but doesn't repro on trunk (though it could still be a
pre-existing issue, if e.g. this patch shifted seeds):
{noformat}
[junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestBlockJoinSorting -Dtests.method=testNestedSorting
-Dtests.seed=A0B8F022A1A8B661 -Dtests.locale=en-CA -Dtests.timezone=Etc/GMT+4
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] FAILURE 0.20s | TestBlockJoinSorting.testNestedSorting <<<
[junit4] > Throwable #1: org.junit.ComparisonFailure: expected:<[e]> but
was:<[f]>
[junit4] > at
__randomizedtesting.SeedInfo.seed([A0B8F022A1A8B661:A8511D63E101BB0F]:0)
[junit4] > at
org.apache.lucene.search.join.TestBlockJoinSorting.testNestedSorting(TestBlockJoinSorting.java:233)
[junit4] > at java.lang.Thread.run(Thread.java:745)
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene70):
{field1=FST50, __type=Lucene50(blocksize=128),
filter_1=Lucene50(blocksize=128), field2=Lucene50(blocksize=128)},
docValues:{field2=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=972,
maxMBSortInHeap=5.645435808865713, sim=RandomSimilarity(queryNorm=false): {},
locale=en-CA, timezone=Etc/GMT+4
[junit4] 2> NOTE: Linux 4.4.0-38-generic amd64/Oracle Corporation
1.8.0_101 (64-bit)/cpus=8,threads=1,free=420118024,total=514850816
[junit4] 2> NOTE: All tests run in this JVM: [TestBlockJoinSorting]
[junit4] Completed [1/1 (1!)] in 0.37s, 1 test, 1 failure <<< FAILURES!
{noformat}
> Improve sparsity support of Lucene70DocValuesFormat
> ---------------------------------------------------
>
> Key: LUCENE-7489
> URL: https://issues.apache.org/jira/browse/LUCENE-7489
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7489.patch, LUCENE-7489.patch
>
>
> Like Lucene70NormsFormat, it should be able to only encode actual values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]