lidavidm commented on code in PR #41967:
URL: https://github.com/apache/arrow/pull/41967#discussion_r1632441720


##########
java/c/src/test/python/integration_tests.py:
##########


Review Comment:
   CC @jorisvandenbossche perhaps?



##########
java/c/src/main/java/org/apache/arrow/c/BufferImportTypeVisitor.java:
##########
@@ -210,9 +210,38 @@ public List<ArrowBuf> visit(ArrowType.Utf8 type) {
     }
   }
 
+  private List<ArrowBuf> visitVariableWidthView(ArrowType type) {
+    final int viewBufferIndex = 1;
+    try (ArrowBuf view = importFixedBytes(type, viewBufferIndex, 
BaseVariableWidthViewVector.ELEMENT_SIZE)) {

Review Comment:
   Why is view being treated specially here? Shouldn't it be the size buffer?



##########
java/c/src/main/java/org/apache/arrow/c/ArrayExporter.java:
##########
@@ -96,8 +97,34 @@ void export(ArrowArray array, FieldVector vector, 
DictionaryProvider dictionaryP
       }
 
       if (buffers != null) {
+        final long bufferSize = buffers.size();
+        /*
+        * For Variadic types, an additional buffer is kept to store
+        * the size of each variadic buffer since that information
+        * cannot be retrieved in the import component.
+        * Here, the dataBufferReqCount is calculated to determine
+        * the additional number of buffers required.
+        * Also note that if the bufferSize is greater than 2, it means
+        * there is one or more data buffers.
+        * Thus, the dataBufferReqCount is set to 1 to get additional buffer
+        * for to store variadic size buffer.
+        * If it is not the case, the dataBuffer is not present.
+        * According to the spec and C Data interface in C++, there must be
+        * at least 3 data buffers present at the import component.
+        * Thus, the dataBufferReqCount is set to 2 to get additional buffer
+        * for empty dataBuffer and the variadic size buffer.
+        */
+        int dataBufferReqCount = 0;
+        if (vector instanceof BaseVariableWidthViewVector) {

Review Comment:
   Carving out special cases all over the place means that the implementation 
is probably wrong. In this case, it looks like getFieldBuffers is too 
specialized to IPC/the previous assumption that IPC and C Data used the same 
buffers is now wrong.
   
   I think for C Data, it would be better to have one new method that simply 
gives the buffer count for ArrayExporter to allocate the buffers pointers, and 
do any type specific logic during the export.



##########
java/c/src/test/python/integration_tests.py:
##########
@@ -183,6 +190,38 @@ def round_trip_reader(self, schema, batches):
     def test_string_array(self):
         self.round_trip_array(lambda: pa.array([None, "a", "bb", "ccc"]))
 
+    def test_stringview_array(self):

Review Comment:
   I'd like to see empty arrays, empty strings, nulls in different positions, 
and actual binary values tested



##########
java/c/src/main/java/org/apache/arrow/c/BufferImportTypeVisitor.java:
##########
@@ -210,9 +210,38 @@ public List<ArrowBuf> visit(ArrowType.Utf8 type) {
     }
   }
 
+  private List<ArrowBuf> visitVariableWidthView(ArrowType type) {
+    final int viewBufferIndex = 1;
+    try (ArrowBuf view = importFixedBytes(type, viewBufferIndex, 
BaseVariableWidthViewVector.ELEMENT_SIZE)) {
+      List<ArrowBuf> buffers = new ArrayList<>();
+      view.getReferenceManager().retain();
+      ArrowBuf maybeValidityBuffer = maybeImportBitmap(type);
+      buffers.add(maybeValidityBuffer);
+      buffers.add(view);
+
+      final int variadicSizeBufferIndex = this.buffers.length - 1;
+      final long numOfVariadicBuffers = this.buffers.length - 3;
+      final long variadicSizeBufferCapacity = numOfVariadicBuffers * 
Long.BYTES;
+      // 0th buffer is validity buffer
+      // 1st buffer is view buffer
+      // 2nd buffer onwards are variadic buffer
+      // N-1 (this.buffers.length - 1) buffer is variadic size buffer
+      final int variadicBufferReadOffset = 2;
+      try (ArrowBuf variadicSizeBufferPrime = importBuffer(type, 
variadicSizeBufferIndex,
+          variadicSizeBufferCapacity)) {
+        variadicSizeBufferPrime.getReferenceManager().retain();

Review Comment:
   And shouldn't we discard the variadic size buffer at the end, since we don't 
need it anymore?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to