yihua commented on issue #5081:
URL: https://github.com/apache/hudi/issues/5081#issuecomment-1113850850
@Guanpx thanks for helping.
As @Guanpx mentioned, `write.parquet.max.file.size` provides the approximate
target for sizing the files and you can make it smaller. Are you using Flink
SQL to write the Hudi table, since `write.parquet.max.file.size` is Flink SQL
specific config?
`hoodie.parquet.max.file.size` achieves the same goal for other write flow,
e.g., Spark.
In general, the file size does not affect whether the queried data has
duplicates or not.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]