Hi Qian,

Maybe you could set hoodie.parquet.max.file.size[1]
and hoodie.parquet.compression.ratio[2] larger to control data size. And
you could see the code snippet in HoodieParquetWriter[3][4].

[1]
https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java#L33
[2]
https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/config/HoodieStorageConfig.java#L45
[3]
https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieParquetWriter.java#L69
[4]
https://github.com/apache/incubator-hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieParquetWriter.java#L96

Best,
Leesf

Qian Wang <[email protected]> 于2019年10月22日周二 上午8:08写道:

> Hi,
>
> When I insert into as Hudi dataset, I found that the data size is pretty
> small. How can I control the output dataset?
>
> -rw-r--r--+ 3 b_shop hdmi-mptna 3231274 2019-10-21 15:56
> /user/tmp/hudi/upsert/default/fd2b6d65-79c9-4b24-a343-caa58b88e006-0_30-176-71232_20191021155623.parquet
> -rw-r--r--+ 3 b_shop hdmi-mptna 3254415 2019-10-21 10:40
> /user/tmp/hudi/upsert/default/fd2b6d65-79c9-4b24-a343-caa58b88e006-0_35-79-1019_20191021103748.parquet
> -rw-r--r--+ 3 b_shop hdmi-mptna 3139027 2019-10-21 15:44
> /user/tmp/hudi/upsert/default/fe4a8424-faae-451e-8e98-d9f2b2fb1561-0_35-106-42782_20191021154432.parquet
> -rw-r--r--+ 3 b_shop hdmi-mptna 3153334 2019-10-21 10:34
> /user/tmp/hudi/upsert/default/fe4a8424-faae-451e-8e98-d9f2b2fb1561-0_41-51-667_20191021103218.parquet
> -rw-r--r--+ 3 b_shop hdmi-mptna 3080996 2019-10-21 10:37
> /user/tmp/hudi/upsert/default/ff12369a-ade0-420f-99ff-567e1f0a9980-0_1-65-804_20191021103508.parquet
> -rw-r--r--+ 3 b_shop hdmi-mptna 3067112 2019-10-21 15:49
> /user/tmp/hudi/upsert/default/ff12369a-ade0-420f-99ff-567e1f0a9980-0_38-141-57005_20191021154949.parquet
>
> Best,
> Qian
>

Reply via email to