maqister commented on a change in pull request #11984:
URL: https://github.com/apache/arrow/pull/11984#discussion_r775112597
##########
File path: cpp/src/parquet/column_reader.cc
##########
@@ -940,7 +940,7 @@ int64_t
TypedColumnReaderImpl<DType>::ReadBatchWithDictionary(
int64_t* indices_read, const T** dict, int32_t* dict_len) {
bool has_dict_output = dict != nullptr && dict_len != nullptr;
// Similar logic as ReadValues to get pages.
- if (!HasNext()) {
+ if (batch_size == 0 || !HasNext()) {
Review comment:
this change breaks use-case in my company where we use this API with
batch_size = 0 explicitly, just to obtain dictionary alone. we use it for our
loading .parquet files into properitary in-memory column store flow.
it is ok from our perspective to change it as we can just add dedicated API
to obtain dictionary in our fork of the code.
i am just bringing this up in case there are other devs impacted by the
public API change.
https://www.hyrumslaw.com/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]