kszucs commented on code in PR #45360:
URL: https://github.com/apache/arrow/pull/45360#discussion_r2083265543


##########
cpp/src/parquet/column_writer.cc:
##########
@@ -1337,13 +1368,47 @@ class TypedColumnWriterImpl : public ColumnWriterImpl,
       bits_buffer_->ZeroPadding();
     }
 
-    if (leaf_array.type()->id() == ::arrow::Type::DICTIONARY) {
-      return WriteArrowDictionary(def_levels, rep_levels, num_levels, 
leaf_array, ctx,
-                                  maybe_parent_nulls);
+    if (properties_->content_defined_chunking_enabled()) {
+      DCHECK(content_defined_chunker_.has_value());
+      auto chunks = content_defined_chunker_->GetChunks(def_levels, rep_levels,

Review Comment:
   I am not sure, maybe? I tend to think than having more than a single chunk 
is more likely but it depends on how `WriteArrow` is being called. On the other 
hand there is always a first chunk where `offset` is 0. 
   
   `WriteArrowDense` and `WriteArrowDictionary` also do an unconditional array 
slice, so this optimization (when `value_offset == 0`) could be added there as 
well, though it would be nice to μ-benchmark it first. Could we defer it to a 
follow-up optimization task?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to