rmahindra123 commented on code in PR #8574:
URL: https://github.com/apache/hudi/pull/8574#discussion_r1198256724


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/ChainedTransformer.java:
##########
@@ -93,9 +103,13 @@ public List<String> getTransformersNames() {
   @Override
   public Dataset<Row> apply(JavaSparkContext jsc, SparkSession sparkSession, 
Dataset<Row> rowDataset, TypedProperties properties) {
     Dataset<Row> dataset = rowDataset;
+    Option<Schema> incomingSchemaOpt = sourceSchemaOpt;
     for (TransformerInfo transformerInfo : transformers) {
       Transformer transformer = transformerInfo.getTransformer();
       dataset = transformer.apply(jsc, sparkSession, dataset, 
transformerInfo.getProperties(properties));
+      if (enableSchemaValidation) {
+        incomingSchemaOpt = validateAndGetTransformedSchema(transformerInfo, 
dataset, incomingSchemaOpt, jsc, sparkSession, properties);

Review Comment:
   Implement the new interface for chained transformer and validate before the 
dataset apply is called. Validation should be in the new interface instead of 
within the apply method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to