RussellSpitzer commented on a change in pull request #2779:
URL: https://github.com/apache/iceberg/pull/2779#discussion_r674941963
##########
File path: spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java
##########
@@ -95,11 +102,16 @@
*/
public class SparkTableUtil {
+ private static final Logger LOG =
LoggerFactory.getLogger(SparkTableUtil.class);
+
private static final Joiner.MapJoiner MAP_JOINER =
Joiner.on(",").withKeyValueSeparator("=");
private static final PathFilter HIDDEN_PATH_FILTER =
p -> !p.getName().startsWith("_") && !p.getName().startsWith(".");
+ private static final String duplicateFileMessage = "Duplicate data files
will be added to this table: %s. " +
Review comment:
Think this should be reworded a bit,
"Cannot complete import because data files to be imported already exist
within the target table. Iceberg is not designed to have multiple references to
the same file within the same table so this type of import is disabled by
default. If you are sure this is what you would like to do set
'$doAVariableReferenceHere' to true to force the import"
Just to make sure folks know that by doubly importing things they are not
necessarily doing something that will work or will be safe in the long run. For
example duplicate file entries will ... have odd effects on MergeInto :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]