[ https://issues.apache.org/jira/browse/HUDI-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364021#comment-17364021 ]
sivabalan narayanan commented on HUDI-2025: ------------------------------------------- Trying to check differences between both flows. Inspected metadata that gets attached to parquet written in both paths. // listing just the keys in extra metadata. sn$ grep "extra" /tmp/regular_bulk_insert_meta.out| cut -d"=" -f1 extra: org.apache.hudi.bloomfilter extra: hoodie_min_record_key extra: parquet.avro.schema extra: writer.model.name extra: hoodie_max_record_key sn$ grep "extra" /tmp/rowWriter_bulk_insert_meta.out| cut -d"=" -f1 extra: org.apache.spark.version extra: org.apache.hudi.bloomfilter extra: hoodie_min_record_key extra: org.apache.spark.sql.parquet.row.metadata extra: hoodie_max_record_key > Bring parity between row writer bulk_insert and rdd based bulk_insert > --------------------------------------------------------------------- > > Key: HUDI-2025 > URL: https://issues.apache.org/jira/browse/HUDI-2025 > Project: Apache Hudi > Issue Type: Task > Reporter: sivabalan narayanan > Priority: Major > > Bring parity between row writer bulk_insert and rdd based bulk_insert -- This message was sent by Atlassian Jira (v8.3.4#803005)