bvaradar commented on code in PR #8452:
URL: https://github.com/apache/hudi/pull/8452#discussion_r1188099833
##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -58,13 +62,21 @@ public class FileSystemBackedTableMetadata implements
HoodieTableMetadata {
private final SerializableConfiguration hadoopConf;
private final String datasetBasePath;
private final boolean assumeDatePartitioning;
+ private final boolean hiveStylePartitioningEnabled;
+ private final boolean urlEncodePartitioningEnabled;
public FileSystemBackedTableMetadata(HoodieEngineContext engineContext,
SerializableConfiguration conf, String datasetBasePath,
boolean assumeDatePartitioning) {
this.engineContext = engineContext;
this.hadoopConf = conf;
this.datasetBasePath = datasetBasePath;
this.assumeDatePartitioning = assumeDatePartitioning;
+ HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder()
Review Comment:
The super class already instantiates metaclient. Please move the members
hiveStylePartitioningEnabled and urlEncodePartitioningEnabled there so that
they can be reused for HoodieBackedTableMetadata
##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/FilterGenVisitor.java:
##########
@@ -42,9 +43,10 @@ private String quoteStringLiteral(String value) {
}
}
- private String visitAnd(Expression left, Expression right) {
- String leftResult = left.accept(this);
- String rightResult = right.accept(this);
+ @Override
+ public String visitAnd(Predicates.And and) {
Review Comment:
Is case sensitivity same between hive-sync and spark integration ?
##########
hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/adapter/Spark2Adapter.scala:
##########
@@ -186,4 +186,13 @@ class Spark2Adapter extends SparkAdapter {
case OFF_HEAP => "OFF_HEAP"
case _ => throw new IllegalArgumentException(s"Invalid StorageLevel:
$level")
}
+
+ override def translateFilter(predicate: Expression,
+ supportNestedPredicatePushdown: Boolean =
false): Option[Filter] = {
+ if (supportNestedPredicatePushdown) {
Review Comment:
Is this expected to fail any spark 2 queries ?
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -153,6 +156,20 @@ protected Option<HoodieRecord<HoodieMetadataPayload>>
getRecordByKey(String key,
return recordsByKeys.size() == 0 ? Option.empty() :
recordsByKeys.get(0).getValue();
}
+ @Override
+ public List<String> getPartitionPathByExpression(List<String>
relativePathPrefixes,
+ Types.RecordType
partitionFields,
+ Expression expression)
throws IOException {
+ Expression boundedExpr = expression.accept(new
BindVisitor(partitionFields, false));
+ boolean hiveStylePartitioningEnabled =
Boolean.parseBoolean(dataMetaClient.getTableConfig().getHiveStylePartitioningEnable());
Review Comment:
Once we move hiveStylePartitioningEnabled and urlEncodePartitioningEnabled
to base class, reuse them instead of creating this each time.
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala:
##########
@@ -307,8 +318,20 @@ class SparkHoodieTableFileIndex(spark: SparkSession,
Seq(new PartitionPath(relativePartitionPathPrefix,
staticPartitionColumnNameValuePairs.map(_._2._2.asInstanceOf[AnyRef]).toArray))
} else {
// Otherwise, compile extracted partition values (from query
predicates) into a sub-path which is a prefix
- // of the complete partition path, do listing for this prefix-path only
-
listPartitionPaths(Seq(relativePartitionPathPrefix).toList.asJava).asScala
+ // of the complete partition path, do listing for this prefix-path and
filter them with partitionPredicates
+ Try {
+
SparkFilterHelper.convertDataType(partitionSchema).asInstanceOf[RecordType]
+ } match {
+ case Success(partitionRecordType) if
partitionRecordType.fields().size() == _partitionSchemaFromProperties.size =>
+ val convertedFilters = SparkFilterHelper.convertFilters(
+ partitionColumnPredicates.flatMap {
+ expr => sparkAdapter.translateFilter(expr)
+ })
+ listPartitionPaths(Seq(relativePartitionPathPrefix).toList.asJava,
partitionRecordType, convertedFilters).asScala
Review Comment:
If we encounter exception such as in Conversions.fromPartitionString
default case, we should revert to list by prefix without filtering.
##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -84,6 +96,19 @@ public List<String> getAllPartitionPaths() throws
IOException {
return getPartitionPathWithPathPrefixes(Collections.singletonList(""));
}
+ @Override
+ public List<String> getPartitionPathByExpression(List<String>
relativePathPrefixes,
Review Comment:
Lets have a standalone test for this method in this class and in
HoodieBackedTableMetadata as this is the crux of this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]