[GitHub] [arrow] maqister commented on a change in pull request #11984: PARQUET-2109: Check if Parquet page has too few values

GitBox Fri, 24 Dec 2021 22:24:49 -0800


maqister commented on a change in pull request #11984:
URL: https://github.com/apache/arrow/pull/11984#discussion_r775112597




##########
File path: cpp/src/parquet/column_reader.cc
##########
@@ -940,7 +940,7 @@ int64_t 
TypedColumnReaderImpl<DType>::ReadBatchWithDictionary(
     int64_t* indices_read, const T** dict, int32_t* dict_len) {
   bool has_dict_output = dict != nullptr && dict_len != nullptr;
   // Similar logic as ReadValues to get pages.
-  if (!HasNext()) {
+  if (batch_size == 0 || !HasNext()) {

Review comment:
       this change breaks use-case in my company where we use this API with 
batch_size = 0 explicitly, just to obtain dictionary alone. we use it for our 
loading .parquet files into properitary in-memory column store flow.
   
   it is ok from our perspective to change it as we can just add dedicated API 
to obtain dictionary in our fork of the code.
   
   i am just bringing this up in case there are other devs impacted by the 
public API change.
   
   https://www.hyrumslaw.com/




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] maqister commented on a change in pull request #11984: PARQUET-2109: Check if Parquet page has too few values

Reply via email to