[GitHub] [hudi] boneanxs commented on a diff in pull request #8452: [HUDI-6077] Add more partition push down filters

via GitHub Sun, 23 Jul 2023 23:50:19 -0700


boneanxs commented on code in PR #8452:
URL: https://github.com/apache/hudi/pull/8452#discussion_r1271800816



##########
hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/adapter/Spark2Adapter.scala:
##########
@@ -186,4 +186,13 @@ class Spark2Adapter extends SparkAdapter {
     case OFF_HEAP => "OFF_HEAP"
     case _ => throw new IllegalArgumentException(s"Invalid StorageLevel: 
$level")
   }
+
+  override def translateFilter(predicate: Expression,
+                               supportNestedPredicatePushdown: Boolean = 
false): Option[Filter] = {
+    if (supportNestedPredicatePushdown) {

Review Comment:
   Oh, sorry, must be rebase making this missing, added it back.



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala:
##########
@@ -383,6 +383,66 @@ class TestHoodieFileIndex extends 
HoodieSparkClientTestBase with ScalaAssertionS
     }
   }
 
+  /**
+   * This test mainly ensures all non-partition-prefix filter can be pushed 
successfully
+   */
+  @ParameterizedTest
+  @CsvSource(value = Array("true, false", "false, false", "true, true", 
"false, true"))
+  def 
testPartitionPruneWithMultiplePartitionColumnsWithComplexExpression(useMetadataTable:
 Boolean,
+                                                                          
complexExpressionPushDown: Boolean): Unit = {

Review Comment:
   Yea, we've already have tests cover different `URL_ENCODE_PARTITIONING` and 
`HIVE_STYLE_PARTITIONING`, such as 
`org.apache.hudi.functional.TestMORDataSource#testMORPartitionPrune`, 
`org.apache.hudi.TestHoodieFileIndex#testPartitionPruneWithMultiplePartitionColumns`,
 they share the same code paths.



##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -50,20 +56,25 @@
 /**
  * Implementation of {@link HoodieTableMetadata} based file-system-backed 
table metadata.
  */
-public class FileSystemBackedTableMetadata implements HoodieTableMetadata {
+public class FileSystemBackedTableMetadata extends AbstractHoodieTableMetadata 
{
 
   private static final int DEFAULT_LISTING_PARALLELISM = 1500;
 
-  private final transient HoodieEngineContext engineContext;
-  private final SerializableConfiguration hadoopConf;
-  private final String datasetBasePath;
   private final boolean assumeDatePartitioning;
 
+  private final boolean hiveStylePartitioningEnabled;
+  private final boolean urlEncodePartitioningEnabled;
+
   public FileSystemBackedTableMetadata(HoodieEngineContext engineContext, 
SerializableConfiguration conf, String datasetBasePath,
                                        boolean assumeDatePartitioning) {
-    this.engineContext = engineContext;
-    this.hadoopConf = conf;
-    this.datasetBasePath = datasetBasePath;
+    super(engineContext, conf, datasetBasePath);
+
+    FileSystem fs = FSUtils.getFs(dataBasePath.get(), conf.get());
+    Path metaPath = new Path(dataBasePath.get(), 
HoodieTableMetaClient.METAFOLDER_NAME);
+    TableNotFoundException.checkTableValidity(fs, this.dataBasePath.get(), 
metaPath);
+    HoodieTableConfig tableConfig = new HoodieTableConfig(fs, 
metaPath.toString(), null, null);

Review Comment:
   Since method 
`org.apache.hudi.metadata.HoodieTableMetadata#createFSBackedTableMetadata` 
doesn't has metaClient, we have to instantiating it here.
   
   For other callers having metaClient, added a new construct method to pass 
tableConfig.



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestLazyPartitionPathFetching.scala:
##########
@@ -55,6 +55,36 @@ class TestLazyPartitionPathFetching extends 
HoodieSparkSqlTestBase {
     }
   }
 
+  test("Test querying with date column + partition pruning") {

Review Comment:
   Like I've said before, we can't see any difference in the physical plan 
since all partition filters are pushed to HUDI side, just some filters can't 
take effect before. In order to make sure partition pruning take effect, I 
added `testPartitionPruneWithMultiplePartitionColumnsWithComplexExpression` and 
check `fileIndex.areAllPartitionPathsCached`, before this pr, the complex 
expression cannot be pushed, so `fileIndex.areAllPartitionPathsCached` return 
true, after this, it should return false.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] boneanxs commented on a diff in pull request #8452: [HUDI-6077] Add more partition push down filters

Reply via email to