parthchandra commented on code in PR #756: URL: https://github.com/apache/datafusion-comet/pull/756#discussion_r1708221958
########## common/src/main/java/org/apache/comet/vector/CometVector.java: ########## @@ -88,7 +90,11 @@ public Decimal getDecimal(int i, int precision, int scale) { if (!useDecimal128 && precision <= Decimal.MAX_INT_DIGITS() && type instanceof IntegerType) { return createDecimal(getInt(i), precision, scale); } else if (precision <= Decimal.MAX_LONG_DIGITS()) { - return createDecimal(useDecimal128 ? getLongDecimal(i) : getLong(i), precision, scale); + if (useDecimal128) { + return createDecimal(getLongFromDecimalBytes(getBinaryDecimal(i)), precision, scale); Review Comment: `CometDictionaryVector` overrides `getBinaryDecimal` so we get the correct value in the buffer. ########## common/src/main/java/org/apache/comet/vector/CometVector.java: ########## @@ -115,24 +121,35 @@ private Decimal createDecimal(BigDecimal value, int precision, int scale) { return dec; } + // bytes.length must be 16 + public long getLongFromDecimalBytes(byte[] bytes) { + assert (bytes.length == 16); + // get Long value from the last eight bytes + // Use ByteBuffer's fast conversion to long + long val = ByteBuffer.wrap(bytes).getLong(8); + return val; + } + /** * Reads a 16-byte byte array which are encoded big-endian for decimal128 into internal byte * array. */ byte[] getBinaryDecimal(int i) { + // TODO: consider implementing a zero-copy version return copyBinaryDecimal(i, DECIMAL_BYTES); } /** Reads a 16-byte byte array which are encoded big-endian for decimal128. */ public byte[] copyBinaryDecimal(int i, byte[] dest) { - long valueBufferAddress = getValueVector().getDataBuffer().memoryAddress(); - Platform.copyMemory( - null, - valueBufferAddress + (long) i * DECIMAL_BYTE_WIDTH, - dest, - Platform.BYTE_ARRAY_OFFSET, - DECIMAL_BYTE_WIDTH); + ValueVector vector = getValueVector(); + // If the index is zero and DECIMAL_BYTES_ALL already has data, we have a new + // batch of data in the vector's backing buffer. So read it again. Review Comment: This check is now unnecessary so I've removed it. For reference, `ColumnReader.loadVector` specified the rules under which the vector may be reused for a new batch of data. When a new batch is read into the same vector, the underlying memory buffer is reused and there is no change in the buffer memory address of the value vector. The check for `index==0` banked on the fact that the iterator reading the values will start from the beginning every time a new batch is read. There was a gotcha though. If the first N values were null the iterator would skip this method and call it for the first time with an index of N which could be non-zero. The only safe way is to reset the buffer when a new batch is read (which is what this PR does now). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org