[ https://issues.apache.org/jira/browse/LUCENE-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152040#comment-14152040 ]
Michael McCandless commented on LUCENE-5938: -------------------------------------------- This is a nice change; I like simplifying MTQ's rewrite methods, to push "sparse/dense" handling "lower". It's hacky now how the auto method tries to switch from Query to FixedBitSet backed filter depending on term/doc count... Maybe fix "word" to be "long" instead? (In javadocs, variable names, etc.). "word" is kind of low-level, platform dependent term ... I found SparseFixedBitSet somewhat hard to understand :) Maybe rename wordCount to nonZeroLongCount or something? approximateCardinality / linear counting algorithm is cool ... do we need to guard against zeroWords being 0? I guess this is allowed with doubles in Java? At least add a comment explaining this corner case "works", and I think add an explicit test case that sets a bit in every long? Spelling: in TestSparseBitSet.copyOf, change sensible -> sensitive Maybe add some more comments around tricky parts of SparseFixedBitSet. E.g., the different branches inside set? And, it looks strange doing 1L << i, but in fact the JVM/processor make that 1L << (i % 64). And Iterator.currentOrNextDoc is scary looking... do we have enough tests here? I hit this test failure, which reproduces with the patch, but not on trunk ... not sure if it's somehow related ... but the test case seems buggy (it doesn't try to unwrap an ExecutionException to get the ACE root cause ... yet I can't get it to fail on trunk w/ beasting): {noformat} NOTE: reproduce with: ant test -Dtestcase=TestReaderClosed -Dtests.method=testReaderChaining -Dtests.seed=89DF4A597D3C8CB1 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt -Dtests.locale=sk -Dtests.timezone=America/Scoresbysund -Dtests.file.encoding=UTF-8 NOTE: test params are: codec=HighCompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=HIGH_COMPRESSION, chunkSize=248), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=HIGH_COMPRESSION, chunkSize=248)), sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=sk, timezone=America/Scoresbysund NOTE: Linux 3.13.0-32-generic amd64/Oracle Corporation 1.7.0_55 (64-bit)/cpus=8,threads=1,free=453126896,total=518979584 NOTE: All tests run in this JVM: [TestReaderClosed] Time: 0.485 There was 1 failure: 1) testReaderChaining(org.apache.lucene.index.TestReaderClosed) java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.lucene.store.AlreadyClosedException: this IndexReader cannot be used anymore as one of its child readers was closed at __randomizedtesting.SeedInfo.seed([89DF4A597D3C8CB1:1EE91FD6CAA6CE7C]:0) at org.apache.lucene.search.IndexSearcher$ExecutionHelper.next(IndexSearcher.java:836) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:452) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:273) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:261) at org.apache.lucene.index.TestReaderClosed.testReaderChaining(TestReaderClosed.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: org.apache.lucene.store.AlreadyClosedException: this IndexReader cannot be used anymore as one of its child readers was closed at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.lucene.search.IndexSearcher$ExecutionHelper.next(IndexSearcher.java:832) ... 41 more Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader cannot be used anymore as one of its child readers was closed at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:279) at org.apache.lucene.index.ParallelLeafReader.getLiveDocs(ParallelLeafReader.java:204) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:611) at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:94) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:483) at org.apache.lucene.search.IndexSearcher$SearcherCallableNoSort.call(IndexSearcher.java:722) at org.apache.lucene.search.IndexSearcher$SearcherCallableNoSort.call(IndexSearcher.java:699) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more {noformat} > New DocIdSet implementation with random write access > ---------------------------------------------------- > > Key: LUCENE-5938 > URL: https://issues.apache.org/jira/browse/LUCENE-5938 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Assignee: Adrien Grand > Attachments: LUCENE-5938.patch, LUCENE-5938.patch, LUCENE-5938.patch, > LUCENE-5938.patch, low_freq.tasks > > > We have a great cost API that is supposed to help make decisions about how to > best execute queries. However, due to the fact that several of our filter > implementations (eg. TermsFilter and BooleanFilter) return FixedBitSets, > either we use the cost API and make bad decisions, or need to fall back to > heuristics which are not as good such as > RandomAccessFilterStrategy.useRandomAccess which decides that random access > should be used if the first doc in the set is less than 100. > On the other hand, we also have some nice compressed and cacheable DocIdSet > implementation but we cannot make use of them because TermsFilter requires a > DocIdSet that has random write access, and FixedBitSet is the only DocIdSet > that we have that supports random access. > I think it would be nice to replace FixedBitSet in those filters with another > DocIdSet that would also support random write access but would have a better > cost? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org