Hi Qian, Maybe you could set hoodie.parquet.max.file.size[1] and hoodie.parquet.compression.ratio[2] larger to control data size. And you could see the code snippet in HoodieParquetWriter[3][4].
[1] https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java#L33 [2] https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java#L45 [3] https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieParquetWriter.java#L69 [4] https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieParquetWriter.java#L96 Best, Leesf Qian Wang <[email protected]> 于2019年10月22日周二 上午8:08写道: > Hi, > > When I insert into as Hudi dataset, I found that the data size is pretty > small. How can I control the output dataset? > > -rw-r--r--+ 3 b_shop hdmi-mptna 3231274 2019-10-21 15:56 > /user/tmp/hudi/upsert/default/fd2b6d65-79c9-4b24-a343-caa58b88e006-0_30-176-71232_20191021155623.parquet > -rw-r--r--+ 3 b_shop hdmi-mptna 3254415 2019-10-21 10:40 > /user/tmp/hudi/upsert/default/fd2b6d65-79c9-4b24-a343-caa58b88e006-0_35-79-1019_20191021103748.parquet > -rw-r--r--+ 3 b_shop hdmi-mptna 3139027 2019-10-21 15:44 > /user/tmp/hudi/upsert/default/fe4a8424-faae-451e-8e98-d9f2b2fb1561-0_35-106-42782_20191021154432.parquet > -rw-r--r--+ 3 b_shop hdmi-mptna 3153334 2019-10-21 10:34 > /user/tmp/hudi/upsert/default/fe4a8424-faae-451e-8e98-d9f2b2fb1561-0_41-51-667_20191021103218.parquet > -rw-r--r--+ 3 b_shop hdmi-mptna 3080996 2019-10-21 10:37 > /user/tmp/hudi/upsert/default/ff12369a-ade0-420f-99ff-567e1f0a9980-0_1-65-804_20191021103508.parquet > -rw-r--r--+ 3 b_shop hdmi-mptna 3067112 2019-10-21 15:49 > /user/tmp/hudi/upsert/default/ff12369a-ade0-420f-99ff-567e1f0a9980-0_38-141-57005_20191021154949.parquet > > Best, > Qian >
