cccs-jc opened a new issue, #7334:
URL: https://github.com/apache/iceberg/issues/7334

   ### Feature Request / Improvement
   
   I'm writing to an Iceberg table using spark structured streaming. I chose to 
put my checkpoint dir inside my target table
   /iceberg/my_schema/data
   /iceberg/my_schema/metadata
   /iceberg/my_schema/checkpoint
   
   When I run the procedure to remove orphan files, Iceberg considers the files 
inside the checkpoint dir as orphans and wants to delete them.
   
   ```sql
   CALL my_catalog.system.remove_orphan_files(
       table => '[my_catalog.my](http://my_catalog.my/)_schema.telemetry_table',
       older_than => timestamp '2023-04-06 00:00:00',
       dry_run => true)
   ```
   I then tried to specify the location I want Iceberg to clean. For example 
the data folder. However, it seems like I have to give it a full path. Is there 
a way to refer to the table location. Something like {table_location}/data ?
   ```sql
   CALL my_catalog.system.remove_orphan_files(
       table => '[my_catalog.my](http://my_catalog.my/)_schema.telemetry_table',
       location => '{table_location}/data',
       older_than => timestamp '2023-04-06 00:00:00',
       dry_run => true)
   ```
   
   The work around I did, is to use the `describe table` to get the location of 
the table
   ```
   spark.sql(f"describe extended {table_name}").where("col_name = 
'Location'").collect()
   ```
   And use that location in the statements above
   
   
   
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to