xushiyan commented on code in PR #7092:
URL: https://github.com/apache/hudi/pull/7092#discussion_r1017755036
##########
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java:
##########
@@ -51,7 +51,7 @@ public Dataset<Row> getDatasetToValidate(SparkSession
session, ExecutionContext
StructType inputSchema) {
String partitionPathField =
context.getWriterContext().getProps().getString(DataSourceWriteOptions.PARTITIONPATH_FIELD().key());
String hudiPath =
context.getHoodieTestSuiteWriter().getCfg().targetBasePath +
(partitionPathField.isEmpty() ? "/" : "/*/*/*");
- Dataset<Row> hudiDf =
session.read().option(HoodieMetadataConfig.ENABLE.key(),
String.valueOf(config.isEnableMetadataValidate()))
+ Dataset<Row> hudiDf =
session.read().option(HoodieMetadataConfig.ENABLE.key(),
String.valueOf(context.getHoodieTestSuiteWriter().getCfg().enableMetadataOnRead))
Review Comment:
@nsivabalan Ok took a closer look at the code. The node-level config is
`DeltaConfig.Config`, which is a confusing class name. By using `Config` alone
in the code makes it hard to tell what it contains. It's actually a bag of
configs for the test suite, which should be called something like test suite
config.
As for `getHoodieTestSuiteWriter().getCfg()` returning
`HoodieTestSuiteJob.HoodieTestSuiteConfig` makes it worse; it's job-level
parameters passed to deltastreamer. We should call it
`HoodieTestSuiteJobParams` to distinguish from a config object, which is
typically a `HoodieConfig` with builder pattern throughout the codebase.
This is not nitpicking; test classes have a lot to do with config pumping
logic so we need to be careful designing good APIs and boundaries. We should
make some code improvements in a separate pr
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]