lidavidm commented on code in PR #777:
URL: https://github.com/apache/arrow-java/pull/777#discussion_r2119855812


##########
vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthVector.java:
##########
@@ -856,70 +855,17 @@ private void splitAndTransferOffsetBuffer(
     target.valueBuffer = transferBuffer(slicedBuffer, target.allocator);
   }
 
-  /*
-   * Transfer the validity.
-   */
-  private void splitAndTransferValidityBuffer(
-      int startIndex, int length, BaseVariableWidthVector target) {
-    if (length <= 0) {
-      return;
-    }
-
+  @Override
+  protected void sliceAndTransferValidityBuffer(
+      int startIndex, int length, BaseValueVector target) {
     final int firstByteSource = BitVectorHelper.byteIndex(startIndex);
-    final int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1);
     final int byteSizeTarget = getValidityBufferSizeFromCount(length);
-    final int offset = startIndex % 8;
 
-    if (offset == 0) {
-      // slice
-      if (target.validityBuffer != null) {
-        target.validityBuffer.getReferenceManager().release();
-      }
-      final ArrowBuf slicedValidityBuffer = 
validityBuffer.slice(firstByteSource, byteSizeTarget);
-      target.validityBuffer = transferBuffer(slicedValidityBuffer, 
target.allocator);
-      return;
-    }
-
-    /* Copy data
-     * When the first bit starts from the middle of a byte (offset != 0),
-     * copy data from src BitVector.
-     * Each byte in the target is composed by a part in i-th byte,
-     * another part in (i+1)-th byte.
-     */
-    target.allocateValidityBuffer(byteSizeTarget);
-
-    for (int i = 0; i < byteSizeTarget - 1; i++) {
-      byte b1 =
-          BitVectorHelper.getBitsFromCurrentByte(this.validityBuffer, 
firstByteSource + i, offset);
-      byte b2 =
-          BitVectorHelper.getBitsFromNextByte(this.validityBuffer, 
firstByteSource + i + 1, offset);
-
-      target.validityBuffer.setByte(i, (b1 + b2));
-    }
-    /* Copying the last piece is done in the following manner:
-     * if the source vector has 1 or more bytes remaining, we copy
-     * the last piece as a byte formed by shifting data
-     * from the current byte and the next byte.
-     *
-     * if the source vector has no more bytes remaining
-     * (we are at the last byte), we copy the last piece as a byte
-     * by shifting data from the current byte.
-     */
-    if ((firstByteSource + byteSizeTarget - 1) < lastByteSource) {
-      byte b1 =
-          BitVectorHelper.getBitsFromCurrentByte(
-              this.validityBuffer, firstByteSource + byteSizeTarget - 1, 
offset);
-      byte b2 =
-          BitVectorHelper.getBitsFromNextByte(
-              this.validityBuffer, firstByteSource + byteSizeTarget, offset);
-
-      target.validityBuffer.setByte(byteSizeTarget - 1, b1 + b2);
-    } else {
-      byte b1 =
-          BitVectorHelper.getBitsFromCurrentByte(
-              this.validityBuffer, firstByteSource + byteSizeTarget - 1, 
offset);
-      target.validityBuffer.setByte(byteSizeTarget - 1, b1);
+    if (target.validityBuffer != null) {
+      target.validityBuffer.getReferenceManager().release();
     }
+    final ArrowBuf slicedValidityBuffer = 
validityBuffer.slice(firstByteSource, byteSizeTarget);
+    target.validityBuffer = transferBuffer(slicedValidityBuffer, 
target.allocator);

Review Comment:
   Hmm, this only slightly differs from the 'base' implementation. However I 
think this one is more correct...I didn't dig too deep but it's possible the 
'simple' one that just retains the buffer was a later/incorrect 
reimplementation or because they were all copy-pasted at some point only some 
types were adjusted to use transferBuffer



##########
vector/src/main/java/org/apache/arrow/vector/BaseValueVector.java:
##########
@@ -248,4 +252,114 @@ public void copyFrom(int fromIndex, int thisIndex, 
ValueVector from) {
   public void copyFromSafe(int fromIndex, int thisIndex, ValueVector from) {
     throw new UnsupportedOperationException();
   }
+
+  /**
+   * Transfer the validity buffer from `validityBuffer` to the target vector's 
`validityBuffer`.
+   * Start at `startIndex` and copy `length` number of elements. If the 
starting index is 8 byte
+   * aligned, then the buffer is sliced from that index and ownership is 
transferred. If not,
+   * individual bytes are copied.
+   *
+   * @param startIndex starting index
+   * @param length number of elements to be copied
+   * @param target target vector
+   */
+  protected void splitAndTransferValidityBuffer(
+      int startIndex, int length, BaseValueVector target) {
+    int offset = startIndex % 8;
+
+    if (length <= 0) {
+      return;
+    }
+    if (offset == 0) {
+      sliceAndTransferValidityBuffer(startIndex, length, target);
+    } else {
+      copyValidityBuffer(startIndex, length, target);
+    }
+  }
+
+  /**
+   * If the start index is 8 byte aligned, slice `validityBuffer` and transfer 
ownership to
+   * `target`'s `validityBuffer`.
+   *
+   * @param startIndex starting index
+   * @param length number of elements to be copied
+   * @param target target vector
+   */
+  protected void sliceAndTransferValidityBuffer(
+      int startIndex, int length, BaseValueVector target) {
+    final int firstByteSource = BitVectorHelper.byteIndex(startIndex);
+    final int byteSizeTarget = getValidityBufferSizeFromCount(length);
+
+    if (target.validityBuffer != null) {
+      target.validityBuffer.getReferenceManager().release();
+    }
+    target.validityBuffer = validityBuffer.slice(firstByteSource, 
byteSizeTarget);
+    target.validityBuffer.getReferenceManager().retain(1);
+  }
+
+  /**
+   * Allocate new validity buffer for `target` and copy bytes from 
`validityBuffer`. Precise details
+   * in the comments below.
+   *
+   * @param startIndex starting index
+   * @param length number of elements to be copied
+   * @param target target vector
+   */
+  protected void copyValidityBuffer(int startIndex, int length, 
BaseValueVector target) {
+    final int firstByteSource = BitVectorHelper.byteIndex(startIndex);
+    final int lastByteSource = BitVectorHelper.byteIndex(valueCount - 1);
+    final int byteSizeTarget = getValidityBufferSizeFromCount(length);
+    final int offset = startIndex % 8;
+
+    /* Copy data
+     * When the first bit starts from the middle of a byte (offset != 0),
+     * copy data from src BitVector.
+     * Each byte in the target is composed by a part in i-th byte,
+     * another part in (i+1)-th byte.
+     */
+    target.allocateValidityBuffer(byteSizeTarget);
+
+    for (int i = 0; i < byteSizeTarget - 1; i++) {
+      byte b1 =
+          BitVectorHelper.getBitsFromCurrentByte(this.validityBuffer, 
firstByteSource + i, offset);
+      byte b2 =
+          BitVectorHelper.getBitsFromNextByte(this.validityBuffer, 
firstByteSource + i + 1, offset);
+
+      target.validityBuffer.setByte(i, (b1 + b2));
+    }
+
+    /* Copying the last piece is done in the following manner:
+     * if the source vector has 1 or more bytes remaining, we copy
+     * the last piece as a byte formed by shifting data
+     * from the current byte and the next byte.
+     *
+     * if the source vector has no more bytes remaining
+     * (we are at the last byte), we copy the last piece as a byte
+     * by shifting data from the current byte.
+     */
+    if ((firstByteSource + byteSizeTarget - 1) < lastByteSource) {
+      byte b1 =
+          BitVectorHelper.getBitsFromCurrentByte(
+              this.validityBuffer, firstByteSource + byteSizeTarget - 1, 
offset);
+      byte b2 =
+          BitVectorHelper.getBitsFromNextByte(
+              this.validityBuffer, firstByteSource + byteSizeTarget, offset);
+
+      target.validityBuffer.setByte(byteSizeTarget - 1, b1 + b2);
+    } else {
+      byte b1 =
+          BitVectorHelper.getBitsFromCurrentByte(
+              this.validityBuffer, firstByteSource + byteSizeTarget - 1, 
offset);
+      target.validityBuffer.setByte(byteSizeTarget - 1, b1);
+    }
+  }
+
+  /**
+   * Allocate new validity buffer for when the bytes need to be copied over
+   *
+   * @param byteSizeTarget desired size of the buffer
+   */
+  protected void allocateValidityBuffer(final long byteSizeTarget) {

Review Comment:
   This is an abstract class - mark this abstract?



##########
vector/src/main/java/org/apache/arrow/vector/BaseValueVector.java:
##########
@@ -248,4 +252,114 @@ public void copyFrom(int fromIndex, int thisIndex, 
ValueVector from) {
   public void copyFromSafe(int fromIndex, int thisIndex, ValueVector from) {
     throw new UnsupportedOperationException();
   }
+
+  /**
+   * Transfer the validity buffer from `validityBuffer` to the target vector's 
`validityBuffer`.
+   * Start at `startIndex` and copy `length` number of elements. If the 
starting index is 8 byte
+   * aligned, then the buffer is sliced from that index and ownership is 
transferred. If not,
+   * individual bytes are copied.
+   *
+   * @param startIndex starting index
+   * @param length number of elements to be copied
+   * @param target target vector
+   */
+  protected void splitAndTransferValidityBuffer(
+      int startIndex, int length, BaseValueVector target) {

Review Comment:
   Checked that UnionVector doesn't extend it either (which is good - as it 
doesn't have a validity buffer)



##########
vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java:
##########
@@ -342,7 +340,8 @@ private void allocateBytes(int valueCount) {
    * slice the source buffer so we have to explicitly allocate the 
validityBuffer of the target
    * vector. This is unlike the databuffer which we can always slice for the 
target vector.
    */
-  private void allocateValidityBuffer(final int validityBufferSize) {
+  @Override
+  protected void allocateValidityBuffer(final long validityBufferSize) {

Review Comment:
   Given many of these are very similar maybe they should delegate to a base 
implementation with `super.allocateValidityBuffer` then call any 
vector-specific logic at the end?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to