nsivabalan opened a new issue, #4802:
URL: https://github.com/apache/iceberg/issues/4802

   May I know how can I set file sizing property while creating iceberg tables? 
   
   Options tried so far:
   
   // lets say parquet_input_tbl refers to input table in parquet. 
   
   1. spark.sql("CREATE TABLE s3.iceberg_1 USING iceberg partitioned by 
(VendorID) AS SELECT * FROM parquet_input_tbl order by VendorID").show()
   
   this created files of sizes ~1Gb. 
   
   2. spark.sql("CREATE TABLE s3.iceberg_2 USING iceberg partitioned by 
(VendorID) TBLPROPERTIES ('write.parquet.target-file-size-bytes '='52428800') 
AS SELECT * FROM parquet_input_tbl order by VendorID").show()
   
   132945282 is the max file size.
   
   From the [configs](https://iceberg.apache.org/docs/latest/configuration/), I 
deduced that parquet row group size matched above value. and so tried the next. 
   
   3. spark.sql("CREATE TABLE s3.iceberg_3 USING iceberg partitioned by 
(VendorID) TBLPROPERTIES ('write.parquet.row-group-size-bytes'='52428800') AS 
SELECT * FROM parquet_input_tbl order by VendorID").show()
   
   this resulted in 1GB files. 
   
   so, may I know how to size the files using configs while creating iceberg 
tables. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to