[GitHub] [iceberg] nsivabalan opened a new issue, #4802: Set file sizing properties for iceberg tables?

GitBox Tue, 17 May 2022 21:44:41 -0700


nsivabalan opened a new issue, #4802:
URL: https://github.com/apache/iceberg/issues/4802


   May I know how can I set file sizing property while creating iceberg tables? 
   
   Options tried so far:
   
   // lets say parquet_input_tbl refers to input table in parquet. 
   
   1. spark.sql("CREATE TABLE s3.iceberg_1 USING iceberg partitioned by 
(VendorID) AS SELECT * FROM parquet_input_tbl order by VendorID").show()
   
   this created files of sizes ~1Gb. 
   
   2. spark.sql("CREATE TABLE s3.iceberg_2 USING iceberg partitioned by 
(VendorID) TBLPROPERTIES ('write.parquet.target-file-size-bytes '='52428800') 
AS SELECT * FROM parquet_input_tbl order by VendorID").show()
   
   132945282 is the max file size.
   
   From the [configs](https://iceberg.apache.org/docs/latest/configuration/), I 
deduced that parquet row group size matched above value. and so tried the next. 
   
   3. spark.sql("CREATE TABLE s3.iceberg_3 USING iceberg partitioned by 
(VendorID) TBLPROPERTIES ('write.parquet.row-group-size-bytes'='52428800') AS 
SELECT * FROM parquet_input_tbl order by VendorID").show()
   
   this resulted in 1GB files. 
   
   so, may I know how to size the files using configs while creating iceberg 
tables. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] nsivabalan opened a new issue, #4802: Set file sizing properties for iceberg tables?

Reply via email to