[GitHub] [pinot] somandal commented on a diff in pull request #8953: Add multi-value noDict APIs and their implementation for querying immutable and mutable fixed-width segment types

GitBox Thu, 23 Jun 2022 17:00:54 -0700


somandal commented on code in PR #8953:
URL: https://github.com/apache/pinot/pull/8953#discussion_r905600098



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/reader/ForwardIndexReader.java:
##########
@@ -410,9 +421,303 @@ default byte[] getBytes(int docId, T context) {
 
   /**
    * MULTI-VALUE COLUMN RAW INDEX APIs
-   * TODO: Not supported yet
    */
 
+  /**
+   * Fills the values
+   * @param docIds Array containing the document ids to read
+   * @param length Number of values to read
+   * @param values Values to fill
+   * @param context Reader context
+   */
+  default void readValuesMV(int[] docIds, int length, int[][] values, T 
context) {
+    switch (getValueType()) {
+      case INT:
+        for (int i = 0; i < length; i++) {
+          values[i] = getIntMV(docIds[i], context);
+        }
+        break;
+      case LONG:
+        for (int i = 0; i < length; i++) {
+          long[] longValueBuffer = getLongMV(docIds[i], context);

Review Comment:
   I am not sure I fully understand what you mean by reuse the buffer here. Can 
you elaborate?
   
   This is my understanding. In this function, we need to create a buffer based 
on the `valueType` available. i.e. if the valueType is double, we need to 
create a double[] buffer and if it is string, we need a String[] buffer. We 
cannot know beforehand which buffer type to create as it depends on 
`getValueType()`.
   
   Even if I pass `maxNumValuesPerMVEntry`, I'll have to allocate a typed 
buffer here. How is this different from me creating the buffer as part of 
`getLongMV()` or one of the other APIs? I will just create it here, pass it to 
the `getLongMV()` function, typecast the values in the filled in buffer to the 
desired values[][] type and store it in values[][].
   
   Or are you suggesting that we create a buffer depending on the `valueType` 
of the caller (e.g. `DataFetcher` knows the `valueType`) within the caller and 
pass that pre-allocated buffer of `maxNumValuesPerMVEntry` size down to the 
`ForwardIndexReader`? I have 2 questions regarding this, a) will the 
`valueType` in the callers like `DataFetcher` match the `valueType` of the 
ForwardIndexReader for all purposes? and b) the number of APIs we need to add 
to the `ForwardIndexReader` will explode as we'll need one for each buffer type 
+ values[][] type combination. Is this acceptable?
   
   I may be misunderstanding your comment, so do let me know what you mean. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] somandal commented on a diff in pull request #8953: Add multi-value noDict APIs and their implementation for querying immutable and mutable fixed-width segment types

Reply via email to