sshkvar commented on pull request #2850:
URL: https://github.com/apache/iceberg/pull/2850#issuecomment-886514901
> In testing etc, I very often use a similar pattern (possibly using a
timestamp as the table suffix).
>
> However, I'm not sure if the best place to be doing this is in the Iceberg
code.
>
> What other tools are you using to create these tables that have UUID
suffixes? Usually, when I encounter this need, I'm doing it in one of two
places:
> (1) Directly from shell scripts or small Spark / Trino jobs when testing
on S3 (and wanting to ensure a brand new table). The solution for me there is
simply to either place the table name with a timestamp in the code. Here's a
sample from some code I have elsewhere:
>
> ```scala
> val currentTime = new Date().getTime
> val tableName = "table_" + currentTime;
> spark.sql(s"CREATE TABLE IF NOT EXISTS my_catalog.default.${tableName}
(name string, age int) USING iceberg")
> ```
>
> (2) From some sort of scheduling tool, such as Airflow or Azkaban. In this
case, it's very easy to create a UUID when passing In the "new table name" to
the spark job.
>
> Effectively, for me, I'm not sure if this is something that makes sense to
place it in Iceberg.
>
> Can you elaborate further on why this isn't something that you can pass as
an argument to your jobs etc? It feels very use case specific, with possible
ways for you to deal with it using existing tools, but maybe I'm not fully
understanding the scope of your problem. 🙂
@kbendick Thanks for the quick reply
Let me provide additional details.
Actually we do not need to change table name (and we don't do it), in
this PR just add uuid suffix to the table location.
We need this to store tables with same name in different "folders" on s3.
Our use-case:
1. We created table with name `test_table` and inserted some data to this
table
2. Then we dropped this table only from metastore, because we should have
ability to restore this table
3. Then we again created new table with same name `test_table`.
4. And again dropped this table
With this PR we will be able to restore any of this tables, because data and
metadata placed in different folders, we just need to restore information about
table location in metastore (we can easy do it via iceberg API).
Also we have scheduled compaction and orhan files cleanup processes. If we
will have data and metadata files for both tables in same folder, orhan files
cleanup process will delete data and metadata for table which was deleted in
step 2.
Based on described above `EXTERNAL` table is not an option for us
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]