Renat Valiullin created ORC-458:
-----------------------------------

             Summary: [C++] Redesign of ColumnVectorBatch/ColumnWriter 
                 Key: ORC-458
                 URL: https://issues.apache.org/jira/browse/ORC-458
             Project: ORC
          Issue Type: Improvement
          Components: C++
            Reporter: Renat Valiullin


Current implementation is not convenient for nested types and has memory 
overhead since we have to construct whole batch before add it to the writer.

Will be better add to the each batch link to its ColumnWriter to allow 
possibility to flush data when batch is full:

listBatch = writer->createRowBatch(batchSize); // create batch tree

elementsBatch = listBatch->elements.get();

for (array : arrays) {

    for (element: array) {

        if (elementsBatch.size == batchSize) elementsBatch.add(); // reset 
batch size to 0

        elementsBatch.data[elementsBatch.size++] = element;

    }

    if (listBatch.size == batchSize) listBatch.add();

    listBatch.data[listBatch.size++] = array.size; // sizes, not offsets

}

writer->add(listBatch); // writeStripe() if needed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to