lidavidm commented on code in PR #41732:
URL: https://github.com/apache/arrow/pull/41732#discussion_r1609124069
##########
java/vector/src/main/java/org/apache/arrow/vector/VectorUnloader.java:
##########
@@ -80,19 +80,29 @@ public VectorUnloader(
public ArrowRecordBatch getRecordBatch() {
List<ArrowFieldNode> nodes = new ArrayList<>();
List<ArrowBuf> buffers = new ArrayList<>();
+ List<Long> variadicBufferCounts = new ArrayList<>();
for (FieldVector vector : root.getFieldVectors()) {
- appendNodes(vector, nodes, buffers);
+ appendNodes(vector, nodes, buffers, variadicBufferCounts);
}
// Do NOT retain buffers in ArrowRecordBatch constructor since we have
already retained them.
return new ArrowRecordBatch(
- root.getRowCount(), nodes, buffers,
CompressionUtil.createBodyCompression(codec), alignBuffers,
- /*retainBuffers*/ false);
+ root.getRowCount(), nodes, buffers,
CompressionUtil.createBodyCompression(codec),
+ variadicBufferCounts, alignBuffers, /*retainBuffers*/ false);
}
- private void appendNodes(FieldVector vector, List<ArrowFieldNode> nodes,
List<ArrowBuf> buffers) {
+ private long getVariadicBufferCount(FieldVector vector) {
+ if (vector instanceof BaseVariableWidthViewVector) {
+ return ((BaseVariableWidthViewVector) vector).getDataBuffers().size();
+ }
+ return 0L;
+ }
+
+ private void appendNodes(FieldVector vector, List<ArrowFieldNode> nodes,
List<ArrowBuf> buffers,
+ List<Long> variadicBufferCounts) {
nodes.add(new ArrowFieldNode(vector.getValueCount(), includeNullCount ?
vector.getNullCount() : -1));
List<ArrowBuf> fieldBuffers = vector.getFieldBuffers();
- int expectedBufferCount =
TypeLayout.getTypeBufferCount(vector.getField().getType());
+ int expectedBufferCount =
TypeLayout.getTypeBufferCount(vector.getField().getType(), vector);
Review Comment:
Right, so: getTypeBufferCount always returns the fixed buffer count (so 2
for Utf8View). Then if you need the full count you need to do something
context-dependent. When loading data into a vector, we have to get the buffer
count from the RecordBatch. When unloading data from a vector, we have this new
`getVariadicBufferCount`. (BTW, I would consider making that a base interface
method with a default implementation that returns 0.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]