rdblue commented on a change in pull request #2141:
URL: https://github.com/apache/iceberg/pull/2141#discussion_r563957090
##########
File path: spark3/src/main/java/org/apache/iceberg/spark/Spark3Util.java
##########
@@ -279,46 +280,39 @@ public Transform unknown(int fieldId, String sourceName,
int sourceId, String tr
return transforms.toArray(new Transform[0]);
}
- public static Distribution toOrderedDistribution(PartitionSpec spec,
SortOrder sortOrder, boolean inferFromSpec) {
- if (sortOrder.isUnsorted()) {
- if (inferFromSpec) {
- SortOrder specOrder = Partitioning.sortOrderFor(spec);
- return Distributions.ordered(convert(specOrder));
- }
-
- return Distributions.unspecified();
- }
-
- Schema schema = spec.schema();
- Multimap<Integer, SortField> sortFieldIndex =
Multimaps.index(sortOrder.fields(), SortField::sourceId);
-
- // build a sort prefix of partition fields that are not already in the
sort order
- SortOrder.Builder builder = SortOrder.builderFor(schema);
- for (PartitionField field : spec.fields()) {
- Collection<SortField> sortFields = sortFieldIndex.get(field.sourceId());
- boolean isSorted = sortFields.stream().anyMatch(sortField ->
- field.transform().equals(sortField.transform()) ||
sortField.transform().satisfiesOrderOf(field.transform()));
- if (!isSorted) {
- String sourceName = schema.findColumnName(field.sourceId());
-
builder.asc(org.apache.iceberg.expressions.Expressions.transform(sourceName,
field.transform()));
- }
+ public static Distribution
buildRequiredDistribution(org.apache.iceberg.Table table) {
+ DistributionMode distributionMode = getDistributionMode(table);
+ switch (distributionMode) {
+ case NONE:
+ return Distributions.unspecified();
+ case HASH:
+ return Distributions.clustered(toTransforms(table.spec()));
Review comment:
Is it correct to return a clustered distribution with no expressions if
the spec is unpartitioned? I think I would rather return
`Distributions.unspecified` just to be safe when passing this back to Spark.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]