anigos commented on code in PR #8194:
URL: https://github.com/apache/iceberg/pull/8194#discussion_r1283935592
##########
core/src/main/java/org/apache/iceberg/TableProperties.java:
##########
@@ -365,4 +365,7 @@ private TableProperties() {}
public static final String UPSERT_ENABLED = "write.upsert.enabled";
public static final boolean UPSERT_ENABLED_DEFAULT = false;
+
+ public static final String UNIQUE_LOCATION = "location.unique";
Review Comment:
I have thought through this and mostly two cases came to my mind. We may
think with this route
1. No database creation should be allowed under an existing database path.
It will help a major problem of people creating even databases under existing
db path.
2. No table creation should be allowed under an existing table path.
**Case 1**
We have the following information with us which is an existing Table and
it's location. Once a table got created in past that defintely has a valid path
do we really needs to check fileIO or a simple string comparison/regex match is
enough?
Say a table's location is `s3://somerandompath/my_database/my_table `
I feel instead of looking into fileIO why not we leverage our own metadata?
We have various ways of creating iceberg table just via database.tableName,
with location etc. This DB path is always a constant path by practice. If
someone is trying to create a table under the same location with same name we
can just throw the exception that s3://somerandompath/my_database/my_table
exists just by looking it's database reference, which should be one level up
and only one level under a database path should be a permissible table path.
The uniqueness not necessarily you need from storage file location but from our
metadata information.
```
CREATE TABLE prod.db.sample
USING iceberg
PARTITIONED BY (part)
TBLPROPERTIES ('key'='value')
AS SELECT ...
```
OR
```
CREATE TABLE IF NOT EXISTS prod.db.sample (
id integer,
......
)
USING ICEBERG
LOCATION
TBLPROPERTIES (
'type' 'hive',.......
)
```
**Case 2**
Rename table : When we rename a table we don't move files it is a metadata
operation. The base path remains same but the table name gets updated. So in
this case there is no impact. For unique location we can still look up to the
metadata and get all unique paths under db reference.
##########
core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java:
##########
@@ -192,6 +192,14 @@ public Table create() {
String baseLocation = location != null ? location :
defaultWarehouseLocation(identifier);
tableProperties.putAll(tableOverrideProperties());
+
+ if
(Boolean.parseBoolean(tableProperties.get(TableProperties.UNIQUE_LOCATION))) {
+ boolean alreadyExists = ops.io().newInputFile(baseLocation).exists();
+ if (alreadyExists) {
+ throw new AlreadyExistsException("Table location already in use:
%s", baseLocation);
Review Comment:
I have thought through this and mostly two cases came to my mind. We may
think with this route
1. No database creation should be allowed under an existing database path.
It will help a major problem of people creating even databases under existing
db path.
2. No table creation should be allowed under an existing table path.
**Case 1**
We have the following information with us which is an existing Table and
it's location. Once a table got created in past that defintely has a valid path
do we really needs to check fileIO or a simple string comparison/regex match is
enough?
Say a table's location is `s3://somerandompath/my_database/my_table `
I feel instead of looking into fileIO why not we leverage our own metadata?
We have various ways of creating iceberg table just via database.tableName,
with location etc. This DB path is always a constant path by practice. If
someone is trying to create a table under the same location with same name we
can just throw the exception that s3://somerandompath/my_database/my_table
exists just by looking it's database reference, which should be one level up
and only one level under a database path should be a permissible table path.
The uniqueness not necessarily you need from storage file location but from our
metadata information.
```
CREATE TABLE prod.db.sample
USING iceberg
PARTITIONED BY (part)
TBLPROPERTIES ('key'='value')
AS SELECT ...
```
OR
```
CREATE TABLE IF NOT EXISTS prod.db.sample (
id integer,
......
)
USING ICEBERG
LOCATION
TBLPROPERTIES (
'type' 'hive',.......
)
```
**Case 2**
Rename table : When we rename a table we don't move files it is a metadata
operation. The base path remains same but the table name gets updated. So in
this case there is no impact. For unique location we can still look up to the
metadata and get all unique paths under db reference.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]