[
https://issues.apache.org/jira/browse/LUCENE-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man reopened LUCENE-8585:
------------------------------
Toke, jenkins has found some reproducible failures from your new tests...
This is from branch_8_0...
{noformat}
[junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestLucene70NormsFormat -Dtests.method=testFewValues
-Dtests.seed=C3613DC62817C401 -Dtests.multiplier=2 -Dtests.nightly=true
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=es-CL
-Dtests.timezone=Asia/Anadyr -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
[junit4] ERROR 7.04s | TestLucene70NormsFormat.testFewValues <<<
[junit4] > Throwable #1: java.nio.file.FileSystemException:
/home/hossman/lucene/dev/lucene/build/backward-codecs/test/J0/temp/lucene.codecs.lucene70.TestLucene70NormsFormat_C3613DC62817C401-001/index-NIOFSDirectory-001/_1k.fdx:
Too many open files
[junit4] > at
__randomizedtesting.SeedInfo.seed([C3613DC62817C401:E1CC12DA3AA6A4FA]:0)
[junit4] > at
org.apache.lucene.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:48)
[junit4] > at
org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81)
[junit4] > at
org.apache.lucene.mockfile.HandleTrackingFS.newFileChannel(HandleTrackingFS.java:197)
[junit4] > at
org.apache.lucene.mockfile.FilterFileSystemProvider.newFileChannel(FilterFileSystemProvider.java:202)
[junit4] > at
java.nio.channels.FileChannel.open(FileChannel.java:287)
[junit4] > at
java.nio.channels.FileChannel.open(FileChannel.java:335)
[junit4] > at
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
[junit4] > at
org.apache.lucene.util.LuceneTestCase.slowFileExists(LuceneTestCase.java:2801)
[junit4] > at
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:747)
[junit4] > at
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
[junit4] > at
org.apache.lucene.store.MockDirectoryWrapper.openChecksumInput(MockDirectoryWrapper.java:1069)
[junit4] > at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.<init>(CompressingStoredFieldsReader.java:128)
[junit4] > at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:121)
[junit4] > at
org.apache.lucene.codecs.lucene50.Lucene50StoredFieldsFormat.fieldsReader(Lucene50StoredFieldsFormat.java:173)
[junit4] > at
org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:126)
[junit4] > at
org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:83)
[junit4] > at
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:66)
[junit4] > at
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
[junit4] > at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
[junit4] > at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:81)
[junit4] > at
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
[junit4] > at
org.apache.lucene.index.BaseNormsFormatTestCase.doTestNormsVersusDocValues(BaseNormsFormatTestCase.java:494)
[junit4] > at
org.apache.lucene.index.BaseNormsFormatTestCase.testFewValues(BaseNormsFormatTestCase.java:181)
[junit4] > at java.lang.Thread.run(Thread.java:748)
[junit4] 2> NOTE: leaving temporary files on disk at:
/home/hossman/lucene/dev/lucene/build/backward-codecs/test/J0/temp/lucene.codecs.lucene70.TestLucene70NormsFormat_C3613DC62817C401-001
[junit4] 2> NOTE: test params are: codec=Asserting(Lucene80): {},
docValues:{}, maxPointsInLeafNode=322, maxMBSortInHeap=7.12719968491226,
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@4d8a3cb),
locale=es-CL, timezone=Asia/Anadyr
[junit4] 2> NOTE: Linux 3.19.0-84-generic amd64/Oracle Corporation
1.8.0_144 (64-bit)/cpus=4,threads=1,free=292853816,total=321912832
[junit4] 2> NOTE: All tests run in this JVM: [TestLucene70NormsFormat]
[junit4] Completed [1/1 (1!)] in 7.48s, 1 test, 1 error <<< FAILURES!
{noformat}
(when running all methods of that test w/that seed, many of them fail _after_
this method, with identical exceptions, but those same methods pass in
isolation -- suggesting perhaps leaked open files? or maybe only leaked on
failure?)
Also: this test is called {{TestLucene70NormsFormat}} but it has {{new
Lucene80Codec()}} hardcoded in it ... which seems like a pretty big WTF?
> Create jump-tables for DocValues at index-time
> ----------------------------------------------
>
> Key: LUCENE-8585
> URL: https://issues.apache.org/jira/browse/LUCENE-8585
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Affects Versions: 8.0
> Reporter: Toke Eskildsen
> Priority: Minor
> Labels: performance
> Fix For: 8.0
>
> Attachments: LUCENE-8585.patch, LUCENE-8585.patch,
> make_patch_lucene8585.sh
>
> Time Spent: 10.5h
> Remaining Estimate: 0h
>
> As noted in LUCENE-7589, lookup of DocValues should use jump-tables to avoid
> long iterative walks. This is implemented in LUCENE-8374 at search-time
> (first request for DocValues from a field in a segment), with the benefit of
> working without changes to existing Lucene 7 indexes and the downside of
> introducing a startup time penalty and a memory overhead.
> As discussed in LUCENE-8374, the codec should be updated to create these
> jump-tables at index time. This eliminates the segment-open time & memory
> penalties, with the potential downside of increasing index-time for DocValues.
> The three elements of LUCENE-8374 should be transferable to index-time
> without much alteration of the core structures:
> * {{IndexedDISI}} block offset and index skips: A {{long}} (64 bits) for
> every 65536 documents, containing the offset of the block in 33 bits and the
> index (number of set bits) up to the block in 31 bits.
> It can be build sequentially and should be stored as a simple sequence of
> consecutive longs for caching of lookups.
> As it is fairly small, relative to document count, it might be better to
> simply memory cache it?
> * {{IndexedDISI}} DENSE (> 4095, < 65536 set bits) blocks: A {{short}} (16
> bits) for every 8 {{longs}} (512 bits) for a total of 256 bytes/DENSE_block.
> Each {{short}} represents the number of set bits up to right before the
> corresponding sub-block of 512 docIDs.
> The \{{shorts}} can be computed sequentially or when the DENSE block is
> flushed (probably the easiest). They should be stored as a simple sequence of
> consecutive shorts for caching of lookups, one logically independent sequence
> for each DENSE block. The logical position would be one sequence at the start
> of every DENSE block.
> Whether it is best to read all the 16 {{shorts}} up front when a DENSE block
> is accessed or whether it is best to only read any individual {{short}} when
> needed is not clear at this point.
> * Variable Bits Per Value: A {{long}} (64 bits) for every 16384 numeric
> values. Each {{long}} holds the offset to the corresponding block of values.
> The offsets can be computed sequentially and should be stored as a simple
> sequence of consecutive {{longs}} for caching of lookups.
> The vBPV-offsets has the largest space overhead og the 3 jump-tables and a
> lot of the 64 bits in each long are not used for most indexes. They could be
> represented as a simple {{PackedInts}} sequence or {{MonotonicLongValues}},
> with the downsides of a potential lookup-time overhead and the need for doing
> the compression after all offsets has been determined.
> I have no experience with the codec-parts responsible for creating
> index-structures. I'm quite willing to take a stab at this, although I
> probably won't do much about it before January 2019. Should anyone else wish
> to adopt this JIRA-issue or co-work on it, I'll be happy to share.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]