Re: [PR] perf: improve decimal read performance in CometVector [datafusion-comet]

via GitHub Wed, 07 Aug 2024 17:41:37 -0700


parthchandra commented on code in PR #756:
URL: https://github.com/apache/datafusion-comet/pull/756#discussion_r1708221958



##########
common/src/main/java/org/apache/comet/vector/CometVector.java:
##########
@@ -88,7 +90,11 @@ public Decimal getDecimal(int i, int precision, int scale) {
     if (!useDecimal128 && precision <= Decimal.MAX_INT_DIGITS() && type 
instanceof IntegerType) {
       return createDecimal(getInt(i), precision, scale);
     } else if (precision <= Decimal.MAX_LONG_DIGITS()) {
-      return createDecimal(useDecimal128 ? getLongDecimal(i) : getLong(i), 
precision, scale);
+      if (useDecimal128) {
+        return createDecimal(getLongFromDecimalBytes(getBinaryDecimal(i)), 
precision, scale);

Review Comment:
   `CometDictionaryVector` overrides `getBinaryDecimal` so we get the correct 
value in the buffer.



##########
common/src/main/java/org/apache/comet/vector/CometVector.java:
##########
@@ -115,24 +121,35 @@ private Decimal createDecimal(BigDecimal value, int 
precision, int scale) {
     return dec;
   }
 
+  // bytes.length must be 16
+  public long getLongFromDecimalBytes(byte[] bytes) {
+    assert (bytes.length == 16);
+    // get Long value from the last eight bytes
+    // Use ByteBuffer's fast conversion to long
+    long val = ByteBuffer.wrap(bytes).getLong(8);
+    return val;
+  }
+
   /**
    * Reads a 16-byte byte array which are encoded big-endian for decimal128 
into internal byte
    * array.
    */
   byte[] getBinaryDecimal(int i) {
+    // TODO: consider implementing a zero-copy version
     return copyBinaryDecimal(i, DECIMAL_BYTES);
   }
 
   /** Reads a 16-byte byte array which are encoded big-endian for decimal128. 
*/
   public byte[] copyBinaryDecimal(int i, byte[] dest) {
-    long valueBufferAddress = getValueVector().getDataBuffer().memoryAddress();
-    Platform.copyMemory(
-        null,
-        valueBufferAddress + (long) i * DECIMAL_BYTE_WIDTH,
-        dest,
-        Platform.BYTE_ARRAY_OFFSET,
-        DECIMAL_BYTE_WIDTH);
+    ValueVector vector = getValueVector();
+    // If the index is zero and DECIMAL_BYTES_ALL already has data, we have a 
new
+    // batch of data in the vector's backing buffer. So read it again.

Review Comment:
   This check is now unnecessary so I've removed it. For reference, 
`ColumnReader.loadVector` specified the rules under which the vector may be 
reused for a new batch of data. When a new batch is read into the same vector, 
the underlying memory buffer is reused and there is no change in the buffer 
memory address of the value vector. The check for `index==0` banked on the fact 
that the iterator reading the values will start from the beginning every time a 
new batch is read. There was a gotcha though. If the first N values were null 
the iterator would skip this method and call it for the first time with an 
index of N which could be non-zero.
   The only safe way is to reset the buffer when a new batch is read (which is 
what this PR does now).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] perf: improve decimal read performance in CometVector [datafusion-comet]

Reply via email to