Quanlong Huang created ORC-1132:
-----------------------------------
Summary: [C++] EncodedStringVectorBatch allocates used buffers
Key: ORC-1132
URL: https://issues.apache.org/jira/browse/ORC-1132
Project: ORC
Issue Type: Improvement
Affects Versions: 1.6.0
Reporter: Quanlong Huang
Assignee: Quanlong Huang
The constructor of EncodedStringVectorBatch invokes the constructor of
StringVectorBatch with batch capacity:
{code:cpp}
EncodedStringVectorBatch::EncodedStringVectorBatch(uint64_t _capacity,
MemoryPool& pool)
: StringVectorBatch(_capacity, pool),
dictionary(),
index(pool, _capacity) {
// PASS
}
{code}
This allocates unused `data` and `length` buffer in StringVectorBatch:
{code:cpp}
StringVectorBatch::StringVectorBatch(uint64_t _capacity, MemoryPool& pool
): ColumnVectorBatch(_capacity, pool),
data(pool, _capacity),
length(pool, _capacity),
blob(pool) {
// PASS
}
{code}
We only use the `index` buffer and `dictionary` of EncodedStringVectorBatch:
{code:cpp}
void StringDictionaryColumnReader::nextEncoded(ColumnVectorBatch& rowBatch,
uint64_t numValues,
char* notNull) {
ColumnReader::next(rowBatch, numValues, notNull);
notNull = rowBatch.hasNulls ? rowBatch.notNull.data() : nullptr;
rowBatch.isEncoded = true;
EncodedStringVectorBatch& batch =
dynamic_cast<EncodedStringVectorBatch&>(rowBatch);
batch.dictionary = this->dictionary;
// Length buffer is reused to save dictionary entry ids
rle->next(batch.index.data(), numValues, notNull);
}
{code}
Thus we should avoid allocating buffers in the base class.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)