nsivabalan commented on a change in pull request #2927:
URL: https://github.com/apache/hudi/pull/2927#discussion_r633009441
##########
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SparkAvroPostProcessor.java
##########
@@ -40,8 +40,8 @@ public SparkAvroPostProcessor(TypedProperties props,
JavaSparkContext jssc) {
@Override
public Schema processSchema(Schema schema) {
- return AvroConversionUtils.convertStructTypeToAvroSchema(
+ return schema != null ? AvroConversionUtils.convertStructTypeToAvroSchema(
Review comment:
@n3nash : wanted to bring to your notice on this change. Prior to this
diff, looks like w/ schema post processor enabled, one can never set target
schema to null bcoz, the post processor will try to invoke this call. So, if I
am not wrong, the code path we have in DeltaSync.readFromSource(), where we
check if userProvidedTargetSchema is null will never be invoked only(bcoz,
always the target schema will be non null). Can you confirm if my
understanding is right.
If yes, may be when we introduced the post processor we missed this flow.
btw, as you see I am removing the constraint. So, wanted to confirm that we
are not breaking any other flow by allowing null for target schema. My tests in
TestHoodieDeltaStreamer does tests all possible config knobs. but just wanted
to be sure.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]