[GitHub] [hudi] bvaradar commented on a diff in pull request #8452: [HUDI-6077] Add more partition push down filters

via GitHub Mon, 08 May 2023 22:28:44 -0700


bvaradar commented on code in PR #8452:
URL: https://github.com/apache/hudi/pull/8452#discussion_r1188099833



##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -58,13 +62,21 @@ public class FileSystemBackedTableMetadata implements 
HoodieTableMetadata {
   private final SerializableConfiguration hadoopConf;
   private final String datasetBasePath;
   private final boolean assumeDatePartitioning;
+  private final boolean hiveStylePartitioningEnabled;
+  private final boolean urlEncodePartitioningEnabled;
 
   public FileSystemBackedTableMetadata(HoodieEngineContext engineContext, 
SerializableConfiguration conf, String datasetBasePath,
                                        boolean assumeDatePartitioning) {
     this.engineContext = engineContext;
     this.hadoopConf = conf;
     this.datasetBasePath = datasetBasePath;
     this.assumeDatePartitioning = assumeDatePartitioning;
+    HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder()

Review Comment:
   The super class already instantiates metaclient. Please move the members 
hiveStylePartitioningEnabled and urlEncodePartitioningEnabled there so that  
they can be reused for HoodieBackedTableMetadata



##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/FilterGenVisitor.java:
##########
@@ -42,9 +43,10 @@ private String quoteStringLiteral(String value) {
     }
   }
 
-  private String visitAnd(Expression left, Expression right) {
-    String leftResult = left.accept(this);
-    String rightResult = right.accept(this);
+  @Override
+  public String visitAnd(Predicates.And and) {

Review Comment:
   Is case sensitivity same between hive-sync and spark integration ? 



##########
hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/adapter/Spark2Adapter.scala:
##########
@@ -186,4 +186,13 @@ class Spark2Adapter extends SparkAdapter {
     case OFF_HEAP => "OFF_HEAP"
     case _ => throw new IllegalArgumentException(s"Invalid StorageLevel: 
$level")
   }
+
+  override def translateFilter(predicate: Expression,
+                               supportNestedPredicatePushdown: Boolean = 
false): Option[Filter] = {
+    if (supportNestedPredicatePushdown) {

Review Comment:
   Is this expected to fail any spark 2 queries ? 



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -153,6 +156,20 @@ protected Option<HoodieRecord<HoodieMetadataPayload>> 
getRecordByKey(String key,
     return recordsByKeys.size() == 0 ? Option.empty() : 
recordsByKeys.get(0).getValue();
   }
 
+  @Override
+  public List<String> getPartitionPathByExpression(List<String> 
relativePathPrefixes,
+                                                   Types.RecordType 
partitionFields,
+                                                   Expression expression) 
throws IOException {
+    Expression boundedExpr = expression.accept(new 
BindVisitor(partitionFields, false));
+    boolean hiveStylePartitioningEnabled = 
Boolean.parseBoolean(dataMetaClient.getTableConfig().getHiveStylePartitioningEnable());

Review Comment:
   Once we move hiveStylePartitioningEnabled and urlEncodePartitioningEnabled 
to base class, reuse them instead of creating this each time. 



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkHoodieTableFileIndex.scala:
##########
@@ -307,8 +318,20 @@ class SparkHoodieTableFileIndex(spark: SparkSession,
         Seq(new PartitionPath(relativePartitionPathPrefix, 
staticPartitionColumnNameValuePairs.map(_._2._2.asInstanceOf[AnyRef]).toArray))
       } else {
         // Otherwise, compile extracted partition values (from query 
predicates) into a sub-path which is a prefix
-        // of the complete partition path, do listing for this prefix-path only
-        
listPartitionPaths(Seq(relativePartitionPathPrefix).toList.asJava).asScala
+        // of the complete partition path, do listing for this prefix-path and 
filter them with partitionPredicates
+        Try {
+          
SparkFilterHelper.convertDataType(partitionSchema).asInstanceOf[RecordType]
+        } match {
+          case Success(partitionRecordType) if 
partitionRecordType.fields().size() == _partitionSchemaFromProperties.size =>
+            val convertedFilters = SparkFilterHelper.convertFilters(
+              partitionColumnPredicates.flatMap {
+                expr => sparkAdapter.translateFilter(expr)
+              })
+            listPartitionPaths(Seq(relativePartitionPathPrefix).toList.asJava, 
partitionRecordType, convertedFilters).asScala

Review Comment:
   If we encounter exception such as in Conversions.fromPartitionString  
default case, we should revert to list by prefix without filtering. 



##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -84,6 +96,19 @@ public List<String> getAllPartitionPaths() throws 
IOException {
     return getPartitionPathWithPathPrefixes(Collections.singletonList(""));
   }
 
+  @Override
+  public List<String> getPartitionPathByExpression(List<String> 
relativePathPrefixes,

Review Comment:
   Lets have a standalone test for this method in this class and in 
HoodieBackedTableMetadata as this is the crux of this PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] bvaradar commented on a diff in pull request #8452: [HUDI-6077] Add more partition push down filters

Reply via email to