[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #4791: Support STRING and BYTES for no dictionary columns in realtime consuming segments

GitBox Thu, 07 Nov 2019 10:16:57 -0800

mcvsubbu commented on a change in pull request #4791: Support STRING and BYTES 
for no dictionary columns in realtime consuming segments
URL: https://github.com/apache/incubator-pinot/pull/4791#discussion_r343790643


 ##########
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java
 ##########
 @@ -212,19 +217,35 @@ public long getLatestIngestionTimestamp() {
       }
 
       DataFileReader indexReaderWriter;
-      if (fieldSpec.isSingleValueField()) {
-        String allocationContext =
-            buildAllocationContext(_segmentName, column, 
V1Constants.Indexes.UNSORTED_SV_FORWARD_INDEX_FILE_EXTENSION);
-        indexReaderWriter = new 
FixedByteSingleColumnSingleValueReaderWriter(_capacity, indexColumnSize, 
_memoryManager,
-            allocationContext);
+
+      if (forwardIndexColumnSize > 0) {
+        // two possible cases can lead here:
+        // (1) dictionary encoded forward index
+        // (2) raw forward index for fixed width types -- INT, LONG, FLOAT, 
DOUBLE
+        if (fieldSpec.isSingleValueField()) {
+          String allocationContext =
+              buildAllocationContext(_segmentName, column, 
V1Constants.Indexes.UNSORTED_SV_FORWARD_INDEX_FILE_EXTENSION);
+          indexReaderWriter = new 
FixedByteSingleColumnSingleValueReaderWriter(_capacity, forwardIndexColumnSize, 
_memoryManager,
+              allocationContext);
+        } else {
+          // TODO: Start with a smaller capacity on 
FixedByteSingleColumnMultiValueReaderWriter and let it expand
+          String allocationContext =
+              buildAllocationContext(_segmentName, column, 
V1Constants.Indexes.UNSORTED_MV_FORWARD_INDEX_FILE_EXTENSION);
+          indexReaderWriter =
+              new 
FixedByteSingleColumnMultiValueReaderWriter(MAX_MULTI_VALUES_PER_ROW, 
avgNumMultiValues, _capacity,
+                  forwardIndexColumnSize, _memoryManager, allocationContext);
+        }
       } else {
-        // TODO: Start with a smaller capacity on 
FixedByteSingleColumnMultiValueReaderWriter and let it expand
-        String allocationContext =
-            buildAllocationContext(_segmentName, column, 
V1Constants.Indexes.UNSORTED_MV_FORWARD_INDEX_FILE_EXTENSION);
-        indexReaderWriter =
-            new 
FixedByteSingleColumnMultiValueReaderWriter(MAX_MULTI_VALUES_PER_ROW, 
avgNumMultiValues, _capacity,
-                indexColumnSize, _memoryManager, allocationContext);
+        // for STRING/BYTES SV column, we support raw index in consuming 
segments
+        // RealtimeSegmentStatsHistory does not have the stats for 
no-dictionary columns
+        // from previous consuming segments
+        // TODO: come up with better estimated values
 
 Review comment:
   Cardinality should not be a factor here, since it is a raw index, and the 
actual values are stored. You only need some estimate for the average string 
length. We can get that from StatsHistory (as long as we update it correctly, 
of course). The call to construct VarByteSiunceColumnSVRW should take _capacity 
as the number of strings to add, and the averageLen that we can get from stats 
history.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #4791: Support STRING and BYTES for no dictionary columns in realtime consuming segments

Reply via email to