Quanlong Huang created ORC-1131:
-----------------------------------
Summary: [C++] getMemoryUsage() is incorrect on string vector
batches
Key: ORC-1131
URL: https://issues.apache.org/jira/browse/ORC-1131
Project: ORC
Issue Type: Bug
Affects Versions: 1.6.0
Reporter: Quanlong Huang
Assignee: Quanlong Huang
The C++ client produces two kinds of string vector batches, i.e.
StringVectorBatch and EncodedStringVectorBatch. They both have incorrect
results in getMemoryUsage() currently.
After ORC-501, we move the blob from StringColumnReader to StringVectorBatch.
However, StringVectorBatch::getMemoryUsage() was not updated to count for it.
{code:cpp}
uint64_t StringVectorBatch::getMemoryUsage() {
return ColumnVectorBatch::getMemoryUsage()
+ static_cast<uint64_t>(data.capacity() * sizeof(char*)
+ length.capacity() * sizeof(int64_t));
} {code}
For EncodedStringVectorBatch, it inherits StringVectorBatch but doesn't
override the getMemoryUsage() method. Thus counting for wrong results.
{code:cpp}
struct EncodedStringVectorBatch : public StringVectorBatch {
EncodedStringVectorBatch(uint64_t capacity, MemoryPool& pool);
virtual ~EncodedStringVectorBatch();
std::string toString() const;
void resize(uint64_t capacity);
std::shared_ptr<StringDictionary> dictionary;
// index for dictionary entry
DataBuffer<int64_t> index;
};{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)