Renat Valiullin created ORC-458:
-----------------------------------
Summary: [C++] Redesign of ColumnVectorBatch/ColumnWriter
Key: ORC-458
URL: https://issues.apache.org/jira/browse/ORC-458
Project: ORC
Issue Type: Improvement
Components: C++
Reporter: Renat Valiullin
Current implementation is not convenient for nested types and has memory
overhead since we have to construct whole batch before add it to the writer.
Will be better add to the each batch link to its ColumnWriter to allow
possibility to flush data when batch is full:
listBatch = writer->createRowBatch(batchSize); // create batch tree
elementsBatch = listBatch->elements.get();
for (array : arrays) {
for (element: array) {
if (elementsBatch.size == batchSize) elementsBatch.add(); // reset
batch size to 0
elementsBatch.data[elementsBatch.size++] = element;
}
if (listBatch.size == batchSize) listBatch.add();
listBatch.data[listBatch.size++] = array.size; // sizes, not offsets
}
writer->add(listBatch); // writeStripe() if needed
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)