[
https://issues.apache.org/jira/browse/LUCENE-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14976476#comment-14976476
]
Steve Rowe commented on LUCENE-6825:
------------------------------------
My Jenkins found a seed that reproduces for me: {{TestDimensionalValues}} tests
({{testMultiValued()}} 100% and {{testMerge()}} sometimes) trigger an NPE in
{{DimensionalWriter.merge()}} in the {{DimensionalReader.intersect()}}
implementation there:
{noformat}
[junit4] Suite: org.apache.lucene.index.TestDimensionalValues
[junit4] 2> NOTE: download the large Jenkins line-docs file by running
'ant get-jenkins-line-docs' in the lucene directory.
[junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestDimensionalValues -Dtests.method=testMultiValued
-Dtests.seed=367B5FB4E6C5CEFF -Dtests.slow=true
-Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt
-Dtests.locale=ga_IE -Dtests.timezone=Mexico/BajaSur -Dtests.asserts=true
-Dtests.file.encoding=ISO-8859-1
[junit4] ERROR 0.57s J0 | TestDimensionalValues.testMultiValued <<<
[junit4] > Throwable #1: org.apache.lucene.store.AlreadyClosedException:
this IndexWriter is closed
[junit4] > at
__randomizedtesting.SeedInfo.seed([367B5FB4E6C5CEFF:E25B3B8628078EB7]:0)
[junit4] > at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:713)
[junit4] > at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:727)
[junit4] > at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1457)
[junit4] > at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1240)
[junit4] > at
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:173)
[junit4] > at
org.apache.lucene.index.TestDimensionalValues.verify(TestDimensionalValues.java:829)
[junit4] > at
org.apache.lucene.index.TestDimensionalValues.verify(TestDimensionalValues.java:791)
[junit4] > at
org.apache.lucene.index.TestDimensionalValues.testMultiValued(TestDimensionalValues.java:212)
[junit4] > at java.lang.Thread.run(Thread.java:745)
[junit4] > Caused by: java.lang.NullPointerException
[junit4] > at
org.apache.lucene.codecs.DimensionalWriter$1.intersect(DimensionalWriter.java:56)
[junit4] > at
org.apache.lucene.codecs.simpletext.SimpleTextDimensionalWriter.writeField(SimpleTextDimensionalWriter.java:139)
[junit4] > at
org.apache.lucene.codecs.DimensionalWriter.merge(DimensionalWriter.java:45)
[junit4] > at
org.apache.lucene.index.SegmentMerger.mergeDimensionalValues(SegmentMerger.java:168)
[junit4] > at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:117)
[junit4] > at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4055)
[junit4] > at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3635)
[junit4] > at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
[junit4] > at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
[junit4] 2> DFómh 27, 2015 6:47:10 A.M.
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
uncaughtException
[junit4] 2> WARNING: Uncaught exception in thread: Thread[Lucene Merge
Thread #1,5,TGRP-TestDimensionalValues]
[junit4] 2> org.apache.lucene.index.MergePolicy$MergeException:
java.lang.NullPointerException
[junit4] 2> at
__randomizedtesting.SeedInfo.seed([367B5FB4E6C5CEFF]:0)
[junit4] 2> at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:668)
[junit4] 2> at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:648)
[junit4] 2> Caused by: java.lang.NullPointerException
[junit4] 2> at
org.apache.lucene.codecs.DimensionalWriter$1.intersect(DimensionalWriter.java:56)
[junit4] 2> at
org.apache.lucene.codecs.simpletext.SimpleTextDimensionalWriter.writeField(SimpleTextDimensionalWriter.java:139)
[junit4] 2> at
org.apache.lucene.codecs.DimensionalWriter.merge(DimensionalWriter.java:45)
[junit4] 2> at
org.apache.lucene.index.SegmentMerger.mergeDimensionalValues(SegmentMerger.java:168)
[junit4] 2> at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:117)
[junit4] 2> at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4055)
[junit4] 2> at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3635)
[junit4] 2> at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
[junit4] 2> at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
[junit4] 2>
[junit4] 2> NOTE: download the large Jenkins line-docs file by running
'ant get-jenkins-line-docs' in the lucene directory.
[junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestDimensionalValues -Dtests.method=testMerge
-Dtests.seed=367B5FB4E6C5CEFF -Dtests.slow=true
-Dtests.linedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt
-Dtests.locale=ga_IE -Dtests.timezone=Mexico/BajaSur -Dtests.asserts=true
-Dtests.file.encoding=ISO-8859-1
[junit4] ERROR 0.14s J0 | TestDimensionalValues.testMerge <<<
[junit4] > Throwable #1:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught
exception in thread: Thread[id=222, name=Lucene Merge Thread #1,
state=RUNNABLE, group=TGRP-TestDimensionalValues]
[junit4] > at
__randomizedtesting.SeedInfo.seed([367B5FB4E6C5CEFF:85DA8150E91E786A]:0)
[junit4] > Caused by: org.apache.lucene.index.MergePolicy$MergeException:
java.lang.NullPointerException
[junit4] > at
__randomizedtesting.SeedInfo.seed([367B5FB4E6C5CEFF]:0)
[junit4] > at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:668)
[junit4] > at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:648)
[junit4] > Caused by: java.lang.NullPointerException
[junit4] > at
org.apache.lucene.codecs.DimensionalWriter$1.intersect(DimensionalWriter.java:56)
[junit4] > at
org.apache.lucene.codecs.simpletext.SimpleTextDimensionalWriter.writeField(SimpleTextDimensionalWriter.java:139)
[junit4] > at
org.apache.lucene.codecs.DimensionalWriter.merge(DimensionalWriter.java:45)
[junit4] > at
org.apache.lucene.index.SegmentMerger.mergeDimensionalValues(SegmentMerger.java:168)
[junit4] > at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:117)
[junit4] > at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4055)
[junit4] > at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3635)
[junit4] > at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
[junit4] > at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
[junit4] 2> NOTE: leaving temporary files on disk at:
/var/lib/jenkins/jobs/Lucene-Solr-tests-trunk/workspace/lucene/build/core/test/J0/temp/lucene.index.TestDimensionalValues_367B5FB4E6C5CEFF-001
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene53): {},
docValues:{}, sim=RandomSimilarityProvider(queryNorm=false,coord=yes): {},
locale=ga_IE, timezone=Mexico/BajaSur
[junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation
1.8.0_45 (64-bit)/cpus=16,threads=1,free=263920904,total=390594560
[junit4] 2> NOTE: All tests run in this JVM: [TestTermScorer,
Test2BPostings, TestLucene50CompoundFormat, TestFilterLeafReader,
TestStressNRT, TestBagOfPostings, TestLazyProxSkipping, Test2BBinaryDocValues,
TestTimSorter, TestPagedBytes, TestTerms, TestSizeBoundedForceMerge,
TestMultiTermQueryRewrites, TestFileSwitchDirectory, TestSpanSearchEquivalence,
TestDeletionPolicy, TestVersion, TestNumericDocValuesUpdates,
TestSpanCollection, TestSpanBoostQuery, TestExternalCodecs,
TestDocsAndPositions, TestTermVectors, TestSimpleFSLockFactory,
TestMergedIterator, Test2BPagedBytes, TestTopDocsMerge, TestIsCurrent, TestNot,
TestWeakIdentityMap, TestBytesRefHash, TestConstantScoreQuery,
TestBufferedChecksum, TestRollingBuffer, TestSimilarity2, TestLSBRadixSorter,
TestRollingUpdates, TestParallelTermEnum, TestDimensionalValues]
[junit4] Completed [83/400] on J0 in 27.01s, 25 tests, 2 errors <<< FAILURES!
{noformat}
> Add multidimensional byte[] indexing support to Lucene
> ------------------------------------------------------
>
> Key: LUCENE-6825
> URL: https://issues.apache.org/jira/browse/LUCENE-6825
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: Trunk
>
> Attachments: LUCENE-6825.patch, LUCENE-6825.patch
>
>
> I think we should graduate the low-level block KD-tree data structure
> from sandbox into Lucene's core?
> This can be used for very fast 1D range filtering for numerics,
> removing the 8 byte (long/double) limit we have today, so e.g. we
> could efficiently support BigInteger, BigDecimal, IPv6 addresses, etc.
> It can also be used for > 1D use cases, like 2D (lat/lon) and 3D
> (x/y/z with geo3d) geo shape intersection searches.
> The idea here is to add a new part of the Codec API (DimensionalFormat
> maybe?) that can do low-level N-dim point indexing and at runtime
> exposes only an "intersect" method.
> It should give sizable performance gains (smaller index, faster
> searching) over what we have today, and even over what auto-prefix
> with efficient numeric terms would do.
> There are many steps here ... and I think adding this is analogous to
> how we added FSTs, where we first added low level data structure
> support and then gradually cutover the places that benefit from an
> FST.
> So for the first step, I'd like to just add the low-level block
> KD-tree impl into oal.util.bkd, but make a couple improvements over
> what we have now in sandbox:
> * Use byte[] as the value not int (@rjernst's good idea!)
> * Generalize it to arbitrary dimensions vs. specialized/forked 1D,
> 2D, 3D cases we have now
> This is already hard enough :) After that we can build the
> DimensionalFormat on top, then cutover existing specialized block
> KD-trees. We also need to fix OfflineSorter to use Directory API so
> we don't fill up /tmp when building a block KD-tree.
> A block KD-tree is at heart an inverted data structure, like postings,
> but is also similar to auto-prefix in that it "picks" proper
> N-dimensional "terms" (leaf blocks) to index based on how the specific
> data being indexed is distributed. I think this is a big part of why
> it's so fast, i.e. in contrast to today where we statically slice up
> the space into the same terms regardless of the data (trie shifting,
> morton codes, geohash, hilbert curves, etc.)
> I'm marking this as trunk only for now... as we iterate we can see if
> it could maybe go back to 5.x...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]