[
https://issues.apache.org/jira/browse/ORC-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated ORC-1132:
--------------------------------
Summary: [C++] EncodedStringVectorBatch allocates unused buffers (was:
[C++] EncodedStringVectorBatch allocates used buffers)
> [C++] EncodedStringVectorBatch allocates unused buffers
> -------------------------------------------------------
>
> Key: ORC-1132
> URL: https://issues.apache.org/jira/browse/ORC-1132
> Project: ORC
> Issue Type: Improvement
> Affects Versions: 1.6.0
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
>
> The constructor of EncodedStringVectorBatch invokes the constructor of
> StringVectorBatch with batch capacity:
> {code:cpp}
> EncodedStringVectorBatch::EncodedStringVectorBatch(uint64_t _capacity,
> MemoryPool& pool)
> : StringVectorBatch(_capacity, pool),
> dictionary(),
> index(pool, _capacity) {
> // PASS
> }
> {code}
> This allocates unused `data` and `length` buffer in StringVectorBatch:
> {code:cpp}
> StringVectorBatch::StringVectorBatch(uint64_t _capacity, MemoryPool& pool
> ): ColumnVectorBatch(_capacity, pool),
> data(pool, _capacity),
> length(pool, _capacity),
> blob(pool) {
> // PASS
> }
> {code}
> We only use the `index` buffer and `dictionary` of EncodedStringVectorBatch:
> {code:cpp}
> void StringDictionaryColumnReader::nextEncoded(ColumnVectorBatch& rowBatch,
> uint64_t numValues,
> char* notNull) {
> ColumnReader::next(rowBatch, numValues, notNull);
> notNull = rowBatch.hasNulls ? rowBatch.notNull.data() : nullptr;
> rowBatch.isEncoded = true;
> EncodedStringVectorBatch& batch =
> dynamic_cast<EncodedStringVectorBatch&>(rowBatch);
> batch.dictionary = this->dictionary;
> // Length buffer is reused to save dictionary entry ids
> rle->next(batch.index.data(), numValues, notNull);
> }
> {code}
> Thus we should avoid allocating buffers in the base class.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)