stiga-huang commented on a change in pull request #990:
URL: https://github.com/apache/orc/pull/990#discussion_r777797249



##########
File path: c++/include/orc/Reader.hh
##########
@@ -149,6 +149,28 @@ namespace orc {
      */
     RowReaderOptions& includeTypes(const std::list<uint64_t>& types);
 
+    /**
+     * A map of <typeId, set<ReadIntent>> used as a parameter in
+     * RowReaderOptions::includeTypesAndIntents.
+     */
+    typedef std::map<uint64_t, std::set<ReadIntent>> TypeReadIntents;

Review comment:
       There are only 3 cases: data-only, pos-only, and all. I think we can add 
an enum `ALL` in `ReadIntent` to avoid using a set here.

##########
File path: c++/include/orc/Vector.hh
##########
@@ -182,12 +182,20 @@ namespace orc {
     uint64_t getMemoryUsage();
     bool hasVariableLength();
 
+    bool hasElements = false;
+    bool hasPositions = false;
+
     /**
      * The offset of the first element of each list.
      * The length of list i is offsets[i+1] - offsets[i].
      */
     DataBuffer<int64_t> offsets;
 
+    /**
+     * Position of each element in their respective array.
+     */
+    DataBuffer<int64_t> pos;

Review comment:
       `uint32_t` might be enough and `positions` might be better. BTW, I think 
clients can calculate this in realtime based on `offsets`. Maybe we don't need 
this field.

##########
File path: c++/src/Options.hh
##########
@@ -174,19 +175,39 @@ namespace orc {
     privateBits->selection = ColumnSelection_FIELD_IDS;
     privateBits->includedColumnIndexes.assign(include.begin(), include.end());
     privateBits->includedColumnNames.clear();
+    privateBits->readIntents.clear();
     return *this;
   }
 
   RowReaderOptions& RowReaderOptions::include(const std::list<std::string>& 
include) {
     privateBits->selection = ColumnSelection_NAMES;
     privateBits->includedColumnNames.assign(include.begin(), include.end());
     privateBits->includedColumnIndexes.clear();
+    privateBits->readIntents.clear();
     return *this;
   }
 
   RowReaderOptions& RowReaderOptions::includeTypes(const std::list<uint64_t>& 
types) {
     privateBits->selection = ColumnSelection_TYPE_IDS;
     privateBits->includedColumnIndexes.assign(types.begin(), types.end());
+    privateBits->readIntents.clear();
+    privateBits->includedColumnNames.clear();
+    privateBits->readIntents.clear();

Review comment:
       nit: duplicated `clear`

##########
File path: c++/include/orc/Reader.hh
##########
@@ -149,6 +149,28 @@ namespace orc {
      */
     RowReaderOptions& includeTypes(const std::list<uint64_t>& types);
 
+    /**
+     * A map of <typeId, set<ReadIntent>> used as a parameter in
+     * RowReaderOptions::includeTypesAndIntents.
+     */
+    typedef std::map<uint64_t, std::set<ReadIntent>> TypeReadIntents;
+
+    /**
+     * Selects which type ids to read and specific ReadIntents for each
+     * type id. The root type is always 0 and the rest of the types are
+     * labeled in a preorder traversal of the tree. The parent types are
+     * automatically selected, but the children are not. If a type id is
+     * selected with an empty ReadIntent set, the empty set will be replaced
+     * by {ReadIntent_DATA}.
+     *
+     * This option clears any previous setting of the selected columns or
+     * types.
+     * @param typesAndIntents a map of <typeId, set<ReadIntent>>.
+     * @return this
+     */
+    RowReaderOptions&
+    includeTypesAndIntents(const TypeReadIntents& typesAndIntents);

Review comment:
       nit: `includeTypesWithIntents`

##########
File path: c++/include/orc/Reader.hh
##########
@@ -149,6 +149,28 @@ namespace orc {
      */
     RowReaderOptions& includeTypes(const std::list<uint64_t>& types);
 
+    /**
+     * A map of <typeId, set<ReadIntent>> used as a parameter in
+     * RowReaderOptions::includeTypesAndIntents.
+     */
+    typedef std::map<uint64_t, std::set<ReadIntent>> TypeReadIntents;
+
+    /**
+     * Selects which type ids to read and specific ReadIntents for each
+     * type id. The root type is always 0 and the rest of the types are
+     * labeled in a preorder traversal of the tree. The parent types are
+     * automatically selected, but the children are not. If a type id is
+     * selected with an empty ReadIntent set, the empty set will be replaced
+     * by {ReadIntent_DATA}.

Review comment:
       Shouldn't the default behavior be selecting all, i.e. data+pos?

##########
File path: c++/include/orc/Common.hh
##########
@@ -122,6 +122,11 @@ namespace orc {
     StreamKind_BLOOM_FILTER_UTF8 = 8
   };
 
+  enum ReadIntent {

Review comment:
       This only applies for array types. Maybe `ArrayReadIntent` is a better 
name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to