szehon-ho commented on code in PR #6849:
URL: https://github.com/apache/iceberg/pull/6849#discussion_r1109216083


##########
docs/spark-procedures.md:
##########
@@ -462,16 +462,29 @@ will then treat these files as if they are part of the 
set of files  owned by Ic
 
 #### Usage
 
-| Argument Name | Required? | Type | Description |
-|---------------|-----------|------|-------------|
-| `table`       | ✔️  | string | Table which will have files added to|
-| `source_table`| ✔️  | string | Table where files should come from, paths are 
also possible in the form of \`file_format\`.\`path\` |
-| `partition_filter`  | ️   | map<string, string> | A map of partitions in the 
source table to import from |
+| Argument Name | Required? | Type | Description                               
                                                                                
                             |
+|---------------|-----------|------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `table`       | ✔️  | string | Table which will have files added to          
                                                                                
                         |
+| `source_table`| ✔️  | string | Table where files should come from, paths are 
also possible in the form of \`file_format\`.\`path\`                           
                         |
+| `partition_filter`  | ️   | map<string, string> | A map of partitions in the 
source table to import from                                                     
                                            |
+| `check_duplicate_files`  | ️   | boolean | When true, will throw a exception 
if files added will result in duplicate (on by default, it's checking against 
files path in entries metadata table). |

Review Comment:
   I think we should avoid implementation details here and also try to make it 
shorter for the table
   
   How about?
   ```
   Whether to prevent files existing in the table from being added.  (defaults 
to true)
   ```



##########
docs/spark-procedures.md:
##########
@@ -462,16 +462,29 @@ will then treat these files as if they are part of the 
set of files  owned by Ic
 
 #### Usage
 
-| Argument Name | Required? | Type | Description |
-|---------------|-----------|------|-------------|
-| `table`       | ✔️  | string | Table which will have files added to|
-| `source_table`| ✔️  | string | Table where files should come from, paths are 
also possible in the form of \`file_format\`.\`path\` |
-| `partition_filter`  | ️   | map<string, string> | A map of partitions in the 
source table to import from |
+| Argument Name | Required? | Type | Description                               
                                                                                
                             |
+|---------------|-----------|------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `table`       | ✔️  | string | Table which will have files added to          
                                                                                
                         |
+| `source_table`| ✔️  | string | Table where files should come from, paths are 
also possible in the form of \`file_format\`.\`path\`                           
                         |
+| `partition_filter`  | ️   | map<string, string> | A map of partitions in the 
source table to import from                                                     
                                            |
+| `check_duplicate_files`  | ️   | boolean | When true, will throw a exception 
if files added will result in duplicate (on by default, it's checking against 
files path in entries metadata table). |
 
 Warning : Schema is not validated, adding files with different schema to the 
Iceberg table will cause issues.
 
 Warning : Files added by this method can be physically deleted by Iceberg 
operations
 
+Warning : SQL delete followed by this add_files procedure immediately might 
not add files back due to deleted files are still tracked in manifest entry 
with status = 2: DELETED
+
+#### Output
+
+| Output Name | Type | Description                                       |
+| ------------|------|---------------------------------------------------|
+| `added_files_count` | long | The number of files added by this command       
  |
+| `changed_partition_count`  | long | The number of partitioned changed by 
this command |
+
+{{< hint info >}}
+changed_partition_count will return 0 when table property 
`compatibility.snapshot-id-inheritance.enabled` is set to true

Review Comment:
   How about 
   changed_partition_count will be 0
   
   (as it is already a return value)
   
   Also, is it worth making it a warning?



##########
docs/spark-procedures.md:
##########
@@ -462,16 +462,29 @@ will then treat these files as if they are part of the 
set of files  owned by Ic
 
 #### Usage
 
-| Argument Name | Required? | Type | Description |
-|---------------|-----------|------|-------------|
-| `table`       | ✔️  | string | Table which will have files added to|
-| `source_table`| ✔️  | string | Table where files should come from, paths are 
also possible in the form of \`file_format\`.\`path\` |
-| `partition_filter`  | ️   | map<string, string> | A map of partitions in the 
source table to import from |
+| Argument Name | Required? | Type | Description                               
                                                                                
                             |
+|---------------|-----------|------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `table`       | ✔️  | string | Table which will have files added to          
                                                                                
                         |
+| `source_table`| ✔️  | string | Table where files should come from, paths are 
also possible in the form of \`file_format\`.\`path\`                           
                         |
+| `partition_filter`  | ️   | map<string, string> | A map of partitions in the 
source table to import from                                                     
                                            |
+| `check_duplicate_files`  | ️   | boolean | When true, will throw a exception 
if files added will result in duplicate (on by default, it's checking against 
files path in entries metadata table). |
 
 Warning : Schema is not validated, adding files with different schema to the 
Iceberg table will cause issues.
 
 Warning : Files added by this method can be physically deleted by Iceberg 
operations
 
+Warning : SQL delete followed by this add_files procedure immediately might 
not add files back due to deleted files are still tracked in manifest entry 
with status = 2: DELETED

Review Comment:
   This part is a bit confusing.   Debating whether it makes sense or not to 
add it, whether its a niche use case that may just confuse the user.
   
   But trying to understand the scope, do you know if it breaks  if the current 
snapshot has that entry as DELETED, or if any of the snapshots has that entry 
as DELETED?
   
   If latter, maybe:
   ```
   Files previously deleted from a table cannot be restored by this procedure 
while they are still tracked by the table metadata. 
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to