[GitHub] [iceberg] Saranviveka commented on issue #6853: Iceberg partition date/day equality problem when created on timestamp

via GitHub Thu, 16 Feb 2023 13:35:16 -0800


Saranviveka commented on issue #6853:
URL: https://github.com/apache/iceberg/issues/6853#issuecomment-1433743603


   scala> spark.sql("""CREATE TABLE iceberg.glue_catalog.iceberg_table1 ( 
order_id bigint,
        | customer_id bigint, order_amount DECIMAL(10,2), category string, 
trans_dt date)
        | USING iceberg
        | location 's3://xxx/glue_catalog/iceberg_table1'
        | PARTITIONED BY (bucket(5, order_id), trans_dt, years(trans_dt))
        | TBLPROPERTIES ('format-version' = '2') """)
        
    scala> spark.sql("""ALTER TABLE iceberg.glue_catalog.iceberg_table1 ADD 
PARTITION FIELD months(trans_dt)  """)
   res29: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("""ALTER TABLE iceberg.glue_catalog.iceberg_table1 ADD 
PARTITION FIELD category """)
   res30: org.apache.spark.sql.DataFrame = []
   
   scala> spark.sql("""
        | INSERT INTO iceberg.glue_catalog.iceberg_table1
        | VALUES
        | ( 10001, 001, 06.17, 'soap', cast('2019-06-13' as date) )
        | """)
   res31: org.apache.spark.sql.DataFrame = []
   
   
   
   If we check the above example, i have defined the trans_dt as a date field 
and enabled partition on trans_dt, years(trans_dt), months(trans_dt)
   
   This is how the file path looks like
   
glue_catalog/iceberg_table1/data/order_id_bucket=3/trans_dt=2019-06-13/trans_dt_year=2019/trans_dt_month=2019-06/category=soap/00000-6-b4c62154-526d-4782-b9d3-9bd093b550ff-00001.parquet
   
   
   scala> spark.sql("""select * from iceberg.glue_catalog.iceberg_table1 where 
trans_dt= '2019'  """).show(false)
   +--------+-----------+------------+--------+--------+
   |order_id|customer_id|order_amount|category|trans_dt|
   +--------+-----------+------------+--------+--------+
   +--------+-----------+------------+--------+--------+
   
   
   scala> spark.sql("""select * from iceberg.glue_catalog.iceberg_table1 where 
trans_dt >= '2019' and trans_dt < '2020'  """).show(false)
   +--------+-----------+------------+--------+----------+
   |order_id|customer_id|order_amount|category|trans_dt  |
   +--------+-----------+------------+--------+----------+
   |10001   |1          |6.17        |soap    |2019-06-13|
   +--------+-----------+------------+--------+----------+
   
   My question is; if we couldn't make use of year or month values directly 
while querying, then whats the point of creating partitions on those segments?  
 Honestly would like to know, when and how it would be leveraged/actually used.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Saranviveka commented on issue #6853: Iceberg partition date/day equality problem when created on timestamp

Reply via email to