boneanxs commented on code in PR #8452:
URL: https://github.com/apache/hudi/pull/8452#discussion_r1271800816
##########
hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/spark/sql/adapter/Spark2Adapter.scala:
##########
@@ -186,4 +186,13 @@ class Spark2Adapter extends SparkAdapter {
case OFF_HEAP => "OFF_HEAP"
case _ => throw new IllegalArgumentException(s"Invalid StorageLevel:
$level")
}
+
+ override def translateFilter(predicate: Expression,
+ supportNestedPredicatePushdown: Boolean =
false): Option[Filter] = {
+ if (supportNestedPredicatePushdown) {
Review Comment:
Oh, sorry, must be rebase making this missing, added it back.
##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieFileIndex.scala:
##########
@@ -383,6 +383,66 @@ class TestHoodieFileIndex extends
HoodieSparkClientTestBase with ScalaAssertionS
}
}
+ /**
+ * This test mainly ensures all non-partition-prefix filter can be pushed
successfully
+ */
+ @ParameterizedTest
+ @CsvSource(value = Array("true, false", "false, false", "true, true",
"false, true"))
+ def
testPartitionPruneWithMultiplePartitionColumnsWithComplexExpression(useMetadataTable:
Boolean,
+
complexExpressionPushDown: Boolean): Unit = {
Review Comment:
Yea, we've already have tests cover different `URL_ENCODE_PARTITIONING` and
`HIVE_STYLE_PARTITIONING`, such as
`org.apache.hudi.functional.TestMORDataSource#testMORPartitionPrune`,
`org.apache.hudi.TestHoodieFileIndex#testPartitionPruneWithMultiplePartitionColumns`,
they share the same code paths.
##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java:
##########
@@ -50,20 +56,25 @@
/**
* Implementation of {@link HoodieTableMetadata} based file-system-backed
table metadata.
*/
-public class FileSystemBackedTableMetadata implements HoodieTableMetadata {
+public class FileSystemBackedTableMetadata extends AbstractHoodieTableMetadata
{
private static final int DEFAULT_LISTING_PARALLELISM = 1500;
- private final transient HoodieEngineContext engineContext;
- private final SerializableConfiguration hadoopConf;
- private final String datasetBasePath;
private final boolean assumeDatePartitioning;
+ private final boolean hiveStylePartitioningEnabled;
+ private final boolean urlEncodePartitioningEnabled;
+
public FileSystemBackedTableMetadata(HoodieEngineContext engineContext,
SerializableConfiguration conf, String datasetBasePath,
boolean assumeDatePartitioning) {
- this.engineContext = engineContext;
- this.hadoopConf = conf;
- this.datasetBasePath = datasetBasePath;
+ super(engineContext, conf, datasetBasePath);
+
+ FileSystem fs = FSUtils.getFs(dataBasePath.get(), conf.get());
+ Path metaPath = new Path(dataBasePath.get(),
HoodieTableMetaClient.METAFOLDER_NAME);
+ TableNotFoundException.checkTableValidity(fs, this.dataBasePath.get(),
metaPath);
+ HoodieTableConfig tableConfig = new HoodieTableConfig(fs,
metaPath.toString(), null, null);
Review Comment:
Since method
`org.apache.hudi.metadata.HoodieTableMetadata#createFSBackedTableMetadata`
doesn't has metaClient, we have to instantiating it here.
For other callers having metaClient, added a new construct method to pass
tableConfig.
##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestLazyPartitionPathFetching.scala:
##########
@@ -55,6 +55,36 @@ class TestLazyPartitionPathFetching extends
HoodieSparkSqlTestBase {
}
}
+ test("Test querying with date column + partition pruning") {
Review Comment:
Like I've said before, we can't see any difference in the physical plan
since all partition filters are pushed to HUDI side, just some filters can't
take effect before. In order to make sure partition pruning take effect, I
added `testPartitionPruneWithMultiplePartitionColumnsWithComplexExpression` and
check `fileIndex.areAllPartitionPathsCached`, before this pr, the complex
expression cannot be pushed, so `fileIndex.areAllPartitionPathsCached` return
true, after this, it should return false.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]