rdblue commented on a change in pull request #2210:
URL: https://github.com/apache/iceberg/pull/2210#discussion_r578801175
##########
File path: spark3/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -790,4 +804,53 @@ public Identifier identifier() {
public static TableIdentifier identifierToTableIdentifier(Identifier
identifier) {
return TableIdentifier.of(Namespace.of(identifier.namespace()),
identifier.name());
}
+
+ /**
+ * Use Spark to list all partitions in the table.
+ *
+ * @param spark a Spark session
+ * @param rootPath a table identifier
+ * @param format format of the file
+ * @return all table's partitions
+ */
+ public static List<SparkTableUtil.SparkPartition> getPartitions(SparkSession
spark, Path rootPath, String format) {
+ FileStatusCache fileStatusCache = FileStatusCache.getOrCreate(spark);
+ Map<String, String> emptyMap = Collections.emptyMap();
+
+ InMemoryFileIndex fileIndex = new InMemoryFileIndex(
Review comment:
I agree that migrating a location-based table is a valid use case. To
me, that's slightly different because like Hive it is a "defined" table format.
We would also convert the detected partition structure to a partition spec that
uses identity, so it is predictable and limited.
For importing files, I don't want to get into a situation where we have to
support non-identity partition transforms or guess how to match up the
partitioning. I think the safest thing is to import a single partition at a
time, but that would be a pain. Maybe if we can come up with reasonable rules
for this we can make it work:
1. All partitions must be identity partitions
2. No transforming data except to parse numbers
Others?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]