[GitHub] [hudi] yihua commented on issue #5081: [SUPPORT] flink hudi produce parquet file size more than 128M

GitBox Fri, 29 Apr 2022 16:01:09 -0700


yihua commented on issue #5081:
URL: https://github.com/apache/hudi/issues/5081#issuecomment-1113850850


   @Guanpx thanks for helping.
   
   As @Guanpx mentioned, `write.parquet.max.file.size` provides the approximate 
target for sizing the files and you can make it smaller.  Are you using Flink 
SQL to write the Hudi table, since `write.parquet.max.file.size` is Flink SQL 
specific config? 
    `hoodie.parquet.max.file.size` achieves the same goal for other write flow, 
e.g., Spark.
   
   In general, the file size does not affect whether the queried data has 
duplicates or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] yihua commented on issue #5081: [SUPPORT] flink hudi produce parquet file size more than 128M

Reply via email to