[jira] [Commented] (LUCENE-4198) Allow codecs to index term impacts

Steve Rowe (JIRA) Wed, 31 Jan 2018 07:32:37 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347016#comment-16347016
 ]


Steve Rowe commented on LUCENE-4198:
------------------------------------

My Jenkins found two reproducing Solr test failures that {{git bisect}} blames 
on the {{f410df8}} commit on this issue:

{noformat}
   [junit4] Suite: org.apache.solr.uninverting.TestUninvertingReader
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues 
-Dtests.seed=120215CB40DFC75D -Dtests.slow=true -Dtests.locale=de-LU 
-Dtests.timezone=America/North_Dakota/New_Salem -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII
   [junit4] ERROR   0.20s J7  | 
TestUninvertingReader.testSortedSetIntegerManyValues <<<
   [junit4]    > Throwable #1: java.lang.UnsupportedOperationException
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([120215CB40DFC75D:F3DEA4F6D55E4A1B]:0)
   [junit4]    >        at 
org.apache.lucene.index.MultiTermsEnum.impacts(MultiTermsEnum.java:373)
   [junit4]    >        at 
org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1615)
   [junit4]    >        at 
org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1873)
   [junit4]    >        at 
org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:337)
   [junit4]    >        at 
org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:319)
   [junit4]    >        at 
org.apache.solr.uninverting.TestUninvertingReader.testSortedSetIntegerManyValues(TestUninvertingReader.java:300)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
   [junit4]   2> NOTE: test params are: codec=Lucene70, 
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@2818cbd7),
 locale=de-LU, timezone=America/North_Dakota/New_Salem
   [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 
1.8.0_151 (64-bit)/cpus=16,threads=1,free=54883016,total=531103744
{noformat}

{noformat}
   [junit4] Suite: org.apache.solr.uninverting.TestDocTermOrds
   [junit4] IGNOR/A 0.00s J2  | TestDocTermOrds.testTriggerUnInvertLimit
   [junit4]    > Assumption #1: 'nightly' test group is disabled (@Nightly())
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestDocTermOrds 
-Dtests.method=testRandomWithPrefix -Dtests.seed=120215CB40DFC75D 
-Dtests.slow=true -Dtests.locale=th -Dtests.timezone=SystemV/MST7MDT 
-Dtests.asserts=true -Dtests.file.encoding=US-ASCII
   [junit4] ERROR   0.03s J2  | TestDocTermOrds.testRandomWithPrefix <<<
   [junit4]    > Throwable #1: java.lang.UnsupportedOperationException
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([120215CB40DFC75D:4E81207B2E152EC5]:0)
   [junit4]    >        at 
org.apache.lucene.index.MultiTermsEnum.impacts(MultiTermsEnum.java:373)
   [junit4]    >        at 
org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1615)
   [junit4]    >        at 
org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1873)
   [junit4]    >        at 
org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:337)
   [junit4]    >        at 
org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:319)
   [junit4]    >        at 
org.apache.solr.uninverting.TestDocTermOrds.testRandomWithPrefix(TestDocTermOrds.java:357)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestDocTermOrds 
-Dtests.method=testRandom -Dtests.seed=120215CB40DFC75D -Dtests.slow=true 
-Dtests.locale=th -Dtests.timezone=SystemV/MST7MDT -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII
   [junit4] ERROR   0.02s J2  | TestDocTermOrds.testRandom <<<
   [junit4]    > Throwable #1: java.lang.UnsupportedOperationException
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([120215CB40DFC75D:604E30C4F1BF712E]:0)
   [junit4]    >        at 
org.apache.lucene.index.MultiTermsEnum.impacts(MultiTermsEnum.java:373)
   [junit4]    >        at 
org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:1615)
   [junit4]    >        at 
org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1873)
   [junit4]    >        at 
org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:337)
   [junit4]    >        at 
org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:319)
   [junit4]    >        at 
org.apache.solr.uninverting.TestDocTermOrds.testRandom(TestDocTermOrds.java:271)
   [junit4]    >        at java.lang.Thread.run(Thread.java:748)
   [junit4]   2> NOTE: leaving temporary files on disk at: 
/var/lib/jenkins/jobs/Lucene-Solr-tests-master/workspace/solr/build/solr-core/test/J2/temp/solr.uninverting.TestDocTermOrds_120215CB40DFC75D-001
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): 
{field=BlockTreeOrds(blocksize=128), foo=BlockTreeOrds(blocksize=128), 
id=PostingsFormat(name=LuceneFixedGap)}, docValues:{}, 
maxPointsInLeafNode=1391, maxMBSortInHeap=6.814533240370871, 
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@177c82a3),
 locale=th, timezone=SystemV/MST7MDT
{noformat}

> Allow codecs to index term impacts
> ----------------------------------
>
>                 Key: LUCENE-4198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4198
>             Project: Lucene - Core
>          Issue Type: Sub-task
>          Components: core/index
>            Reporter: Robert Muir
>            Priority: Major
>             Fix For: master (8.0)
>
>         Attachments: LUCENE-4198-BMW.patch, LUCENE-4198.patch, 
> LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198.patch, LUCENE-4198.patch, 
> LUCENE-4198_flush.patch
>
>
> Subtask of LUCENE-4100.
> Thats an example of something similar to impact indexing (though, his 
> implementation currently stores a max for the entire term, the problem is the 
> same).
> We can imagine other similar algorithms too: I think the codec API should be 
> able to support these.
> Currently it really doesnt: Stefan worked around the problem by providing a 
> tool to 'rewrite' your index, he passes the IndexReader and Similarity to it. 
> But it would be better if we fixed the codec API.
> One problem is that the Postings writer needs to have access to the 
> Similarity. Another problem is that it needs access to the term and 
> collection statistics up front, rather than after the fact.
> This might have some cost (hopefully minimal), so I'm thinking to experiment 
> in a branch with these changes and see if we can make it work well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4198) Allow codecs to index term impacts

Reply via email to