lokeshj1703 commented on code in PR #8514:
URL: https://github.com/apache/hudi/pull/8514#discussion_r1183341573
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/ChainedTransformer.java:
##########
@@ -19,36 +19,137 @@
package org.apache.hudi.utilities.transform;
import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.ReflectionUtils;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
import java.util.List;
+import java.util.Map;
+import java.util.Set;
import java.util.stream.Collectors;
/**
* A {@link Transformer} to chain other {@link Transformer}s and apply
sequentially.
*/
public class ChainedTransformer implements Transformer {
- private List<Transformer> transformers;
+ // Delimiter used to separate class name and the property key suffix. The
suffix comes first.
+ private static final String TRANSFORMER_CLASS_NAME_ID_DELIMITER = ":";
Review Comment:
Addressed
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java:
##########
@@ -191,15 +190,11 @@ public static SchemaPostProcessor
createSchemaPostProcessor(
}
- public static Option<Transformer> createTransformer(List<String> classNames)
throws IOException {
+ public static Option<Transformer> createTransformer(Option<List<String>>
classNamesOpt) throws IOException {
try {
- List<Transformer> transformers = new ArrayList<>();
- for (String className :
Option.ofNullable(classNames).orElse(Collections.emptyList())) {
- transformers.add(ReflectionUtils.loadClass(className));
- }
- return transformers.isEmpty() ? Option.empty() : Option.of(new
ChainedTransformer(transformers));
+ return classNamesOpt.map(classNames -> classNames.isEmpty() ? null : new
ChainedTransformer(classNames));
} catch (Throwable e) {
- throw new IOException("Could not load transformer class(es) " +
classNames, e);
+ throw new IOException("Could not load transformer class(es) " +
classNamesOpt, e);
Review Comment:
Addressed
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java:
##########
@@ -276,7 +276,17 @@ public static class Config implements Serializable {
+ ". Allows transforming raw source Dataset to a target Dataset
(conforming to target schema) before "
+ "writing. Default : Not set. E:g -
org.apache.hudi.utilities.transform.SqlQueryBasedTransformer (which "
+ "allows a SQL query templated to be passed as a transformation
function). "
- + "Pass a comma-separated list of subclass names to chain the
transformations.")
+ + "Pass a comma-separated list of subclass names to chain the
transformations. Transformer can also include "
+ + "an identifier. E:g -
tr1:org.apache.hudi.utilities.transform.SqlQueryBasedTransformer. Here the
identifier tr1 "
+ + "can be used along with property key like
`hoodie.deltastreamer.transformer.sql.tr1` to identify properties related "
+ + "to the transformer. So effective value for
`hoodie.deltastreamer.transformer.sql` is determined by key "
+ + "`hoodie.deltastreamer.transformer.sql.tr1` for this
transformer. This is useful when there are two or more "
+ + "transformers using the same config keys and expect different
values for those keys. If identifier is used, it should "
+ + "be specified for all the transformers. Further the order in
which transformer is applied is determined by the occurrence "
+ + "of transformer irrespective of the identifier used for the
transformer. For example: In the configured value below "
+ +
"tr2:org.apache.hudi.utilities.transform.SqlQueryBasedTransformer,tr1:org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
"
+ + ", tr2 is applied before tr1 based on order of occurrence."
+ )
Review Comment:
Addressed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]