richardstartin commented on a change in pull request #6320:
URL: https://github.com/apache/incubator-pinot/pull/6320#discussion_r537132715
##########
File path:
pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/inv/OffHeapBitmapInvertedIndexCreator.java
##########
@@ -181,33 +188,51 @@ public void seal()
}
// Create bitmaps from inverted index buffers and serialize them to file
- try (DataOutputStream offsetDataStream = new DataOutputStream(
- new BufferedOutputStream(new FileOutputStream(_invertedIndexFile)));
- FileOutputStream bitmapFileStream = new
FileOutputStream(_invertedIndexFile);
- DataOutputStream bitmapDataStream = new DataOutputStream(new
BufferedOutputStream(bitmapFileStream))) {
- int bitmapOffset = (_cardinality + 1) * Integer.BYTES;
- offsetDataStream.writeInt(bitmapOffset);
- bitmapFileStream.getChannel().position(bitmapOffset);
-
+ ByteBuffer offsetBuffer = null;
+ ByteBuffer bitmapBuffer = null;
+ try (FileChannel channel = new RandomAccessFile(_invertedIndexFile,
"rw").getChannel()) {
+ // map the offsets buffer
+ final int startOfBitmaps = (_cardinality + 1) * Integer.BYTES;
+ int bitmapOffset = startOfBitmaps;
+ offsetBuffer = channel.map(FileChannel.MapMode.READ_WRITE, 0,
bitmapOffset).order(LITTLE_ENDIAN);
+ offsetBuffer.putInt(reverseBytes(bitmapOffset));
+ RoaringBitmap[] bitmaps = new RoaringBitmap[_cardinality];
+ RoaringBitmapWriter<RoaringBitmap> writer = RoaringBitmapWriter.writer()
+ .initialCapacity(((_nextDocId - 1) >>> 16) / _cardinality).get();
Review comment:
I felt this heuristic was distracting so removed it.
In case this comes up in the future, expecting the range to be `[0,
_nextDocId)` would allocate far too much memory for datasets sorted by the
indexed dimension, but would be the right choice if the association between row
id and dictionary id were uniformly random. Some metadata could probably be
associated with each dictionary id (number of documents, min, max, density?) in
memory to use these methods could likely be used more effectively, and not in a
way that overfits to a benchmark.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]