[jira] [Commented] (LUCENE-5938) New DocIdSet implementation with random write access

Michael McCandless (JIRA) Mon, 29 Sep 2014 11:41:57 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152040#comment-14152040
 ]


Michael McCandless commented on LUCENE-5938:
--------------------------------------------

This is a nice change; I like simplifying MTQ's rewrite methods, to
push "sparse/dense" handling "lower".  It's hacky now how the auto
method tries to switch from Query to FixedBitSet backed filter
depending on term/doc count...

Maybe fix "word" to be "long" instead?  (In javadocs, variable names,
etc.).  "word" is kind of low-level, platform dependent term ... I
found SparseFixedBitSet somewhat hard to understand :)  Maybe rename
wordCount to nonZeroLongCount or something?

approximateCardinality / linear counting algorithm is cool ...  do we
need to guard against zeroWords being 0?  I guess this is allowed with
doubles in Java?  At least add a comment explaining this corner case
"works", and I think add an explicit test case that sets a bit in
every long?

Spelling: in TestSparseBitSet.copyOf, change sensible -> sensitive

Maybe add some more comments around tricky parts of SparseFixedBitSet.
E.g., the different branches inside set?  And, it looks strange doing
1L << i, but in fact the JVM/processor make that 1L << (i % 64).  And
Iterator.currentOrNextDoc is scary looking... do we have enough tests
here?

I hit this test failure, which reproduces with the patch, but not on
trunk ... not sure if it's somehow related ... but the test case seems
buggy (it doesn't try to unwrap an ExecutionException to get the ACE
root cause ... yet I can't get it to fail on trunk w/ beasting):

{noformat}
NOTE: reproduce with: ant test  -Dtestcase=TestReaderClosed 
-Dtests.method=testReaderChaining -Dtests.seed=89DF4A597D3C8CB1 
-Dtests.slow=true 
-Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt 
-Dtests.locale=sk -Dtests.timezone=America/Scoresbysund 
-Dtests.file.encoding=UTF-8
NOTE: test params are: 
codec=HighCompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=HIGH_COMPRESSION,
 chunkSize=248), 
termVectorsFormat=CompressingTermVectorsFormat(compressionMode=HIGH_COMPRESSION,
 chunkSize=248)), sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, 
locale=sk, timezone=America/Scoresbysund
NOTE: Linux 3.13.0-32-generic amd64/Oracle Corporation 1.7.0_55 
(64-bit)/cpus=8,threads=1,free=453126896,total=518979584
NOTE: All tests run in this JVM: [TestReaderClosed]

Time: 0.485
There was 1 failure:
1) testReaderChaining(org.apache.lucene.index.TestReaderClosed)
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
org.apache.lucene.store.AlreadyClosedException: this IndexReader cannot be used 
anymore as one of its child readers was closed
        at 
__randomizedtesting.SeedInfo.seed([89DF4A597D3C8CB1:1EE91FD6CAA6CE7C]:0)
        at 
org.apache.lucene.search.IndexSearcher$ExecutionHelper.next(IndexSearcher.java:836)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:452)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:273)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:261)
        at 
org.apache.lucene.index.TestReaderClosed.testReaderChaining(TestReaderClosed.java:83)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
        at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
        at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
        at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
        at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
        at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
        at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
        at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
        at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
        at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
        at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
        at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
        at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
        at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
        at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
        at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.lucene.store.AlreadyClosedException: this IndexReader cannot be used 
anymore as one of its child readers was closed
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:188)
        at 
org.apache.lucene.search.IndexSearcher$ExecutionHelper.next(IndexSearcher.java:832)
        ... 41 more
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader 
cannot be used anymore as one of its child readers was closed
        at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:279)
        at 
org.apache.lucene.index.ParallelLeafReader.getLiveDocs(ParallelLeafReader.java:204)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:611)
        at 
org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:94)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:483)
        at 
org.apache.lucene.search.IndexSearcher$SearcherCallableNoSort.call(IndexSearcher.java:722)
        at 
org.apache.lucene.search.IndexSearcher$SearcherCallableNoSort.call(IndexSearcher.java:699)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        ... 1 more
{noformat}


> New DocIdSet implementation with random write access
> ----------------------------------------------------
>
>                 Key: LUCENE-5938
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5938
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-5938.patch, LUCENE-5938.patch, LUCENE-5938.patch, 
> LUCENE-5938.patch, low_freq.tasks
>
>
> We have a great cost API that is supposed to help make decisions about how to 
> best execute queries. However, due to the fact that several of our filter 
> implementations (eg. TermsFilter and BooleanFilter) return FixedBitSets, 
> either we use the cost API and make bad decisions, or need to fall back to 
> heuristics which are not as good such as 
> RandomAccessFilterStrategy.useRandomAccess which decides that random access 
> should be used if the first doc in the set is less than 100.
> On the other hand, we also have some nice compressed and cacheable DocIdSet 
> implementation but we cannot make use of them because TermsFilter requires a 
> DocIdSet that has random write access, and FixedBitSet is the only DocIdSet 
> that we have that supports random access.
> I think it would be nice to replace FixedBitSet in those filters with another 
> DocIdSet that would also support random write access but would have a better 
> cost?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5938) New DocIdSet implementation with random write access

Reply via email to