kszucs commented on code in PR #45360: URL: https://github.com/apache/arrow/pull/45360#discussion_r2083265543
########## cpp/src/parquet/column_writer.cc: ########## @@ -1337,13 +1368,47 @@ class TypedColumnWriterImpl : public ColumnWriterImpl, bits_buffer_->ZeroPadding(); } - if (leaf_array.type()->id() == ::arrow::Type::DICTIONARY) { - return WriteArrowDictionary(def_levels, rep_levels, num_levels, leaf_array, ctx, - maybe_parent_nulls); + if (properties_->content_defined_chunking_enabled()) { + DCHECK(content_defined_chunker_.has_value()); + auto chunks = content_defined_chunker_->GetChunks(def_levels, rep_levels, Review Comment: I am not sure, maybe? I tend to think than having more than a single chunk is more likely but it depends on how `WriteArrow` is being called. On the other hand there is always a first chunk where `offset` is 0. `WriteArrowDense` and `WriteArrowDictionary` also do an unconditional array slice, so this optimization (when `value_offset == 0`) could be added there as well, though it would be nice to μ-benchmark it first. Could we defer it to a follow-up optimization task? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org