[
https://issues.apache.org/jira/browse/HUDI-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan closed HUDI-3110.
-------------------------------------
Resolution: Invalid
> parquet max file size not honored
> ---------------------------------
>
> Key: HUDI-3110
> URL: https://issues.apache.org/jira/browse/HUDI-3110
> Project: Apache Hudi
> Issue Type: Bug
> Affects Versions: 0.11.0
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
> Labels: sev:high
> Fix For: 0.11.0
>
>
> setting hoodie.parquet.max.file.size does not get honored.
> I still see size reaches 120Mb even though I configure max parquet size to
> 50MB.
> this is happening in both row writer path and non row writer path.
>
> df.write.format("hudi").
> | option(PRECOMBINE_FIELD_OPT_KEY, "other").
> | option(RECORDKEY_FIELD_OPT_KEY, "id").
> | option(PARTITIONPATH_FIELD_OPT_KEY, "type").
> | option(OPERATION_OPT_KEY,"bulk_insert").
> | option("hoodie.bulkinsert.shuffle.parallelism", "4").
> | option("hoodie.parquet.max.file.size","52428800").
> | option(TABLE_NAME, tableName).
> | option("hoodie.datasource.write.row.writer.enable","false").
> | mode(Overwrite).
> | save(basePath)
>
> ls -ltr /tmp/hudi_trips_cow/PullRequestEvent
> total 754048
> -rw-r--r-- 1 nsb wheel 121847456 Dec 27 19:14
> e199774a-ceec-47bb-883e-4e669877f778-3_1-34-192_20211227191149448.parquet
> -rw-r--r-- 1 nsb wheel 119741276 Dec 27 19:14
> e199774a-ceec-47bb-883e-4e669877f778-4_1-34-192_20211227191149448.parquet
> -rw-r--r-- 1 nsb wheel 114652047 Dec 27 19:14
> e199774a-ceec-47bb-883e-4e669877f778-5_1-34-192_20211227191149448.parquet
--
This message was sent by Atlassian Jira
(v8.20.1#820001)