wgtmac commented on code in PR #2010:
URL: https://github.com/apache/orc/pull/2010#discussion_r1734869053
##########
c++/include/orc/Vector.hh:
##########
@@ -88,6 +89,11 @@ namespace orc {
*/
virtual bool hasVariableLength();
+ /**
+ * Decode dictionary into vector batch.
+ */
+ virtual void decodeDictionary() {}
Review Comment:
We need to implement this for struct, map, list and union type by calling it
recursively.
##########
c++/src/ColumnWriter.cc:
##########
@@ -130,6 +130,8 @@ namespace orc {
hasNullValue = true;
}
}
+
+ batch.decodeDictionary();
Review Comment:
My intention to add `decodeDictionary()` is that caller should call
`decodeDictionary()` before calling `ColumnWriter::add()`. This can make the
implementation of `ColumnWriter ` simpler.
##########
c++/include/orc/Vector.hh:
##########
@@ -248,10 +254,16 @@ namespace orc {
~EncodedStringVectorBatch() override;
std::string toString() const override;
void resize(uint64_t capacity) override;
+
+ // Calculate data and length in StringVectorBatch from dictionary and index
+ void decodeDictionary() override;
+
std::shared_ptr<StringDictionary> dictionary;
// index for dictionary entry
DataBuffer<int64_t> index;
+
+ std::atomic<bool> dictionaryDecoded{false};
Review Comment:
I think it is an overkill to use atomic here. It should be a misuse to call
member functions of ColumnVectorBatch concurrently.
##########
c++/include/orc/Vector.hh:
##########
@@ -88,6 +89,11 @@ namespace orc {
*/
virtual bool hasVariableLength();
+ /**
+ * Decode dictionary into vector batch.
+ */
+ virtual void decodeDictionary() {}
Review Comment:
BTW, please add a test case to make sure it works as expected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]