[GitHub] [hudi] nsivabalan commented on a change in pull request #2927: [HUDI-1129] Adding support to ingest records with old schema after table's schema is evolved

GitBox Sat, 15 May 2021 14:48:43 -0700


nsivabalan commented on a change in pull request #2927:
URL: https://github.com/apache/hudi/pull/2927#discussion_r633009441




##########
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SparkAvroPostProcessor.java
##########
@@ -40,8 +40,8 @@ public SparkAvroPostProcessor(TypedProperties props, 
JavaSparkContext jssc) {
 
   @Override
   public Schema processSchema(Schema schema) {
-    return AvroConversionUtils.convertStructTypeToAvroSchema(
+    return schema != null ? AvroConversionUtils.convertStructTypeToAvroSchema(

Review comment:
       @n3nash : wanted to bring to your notice on this change. Prior to this 
diff, looks like w/ schema post processor enabled, one can never set target 
schema to null bcoz, the post processor will try to invoke this call. So, if I 
am not wrong, the code path we have in DeltaSync.readFromSource(), where we 
check if userProvidedTargetSchema is null will never be invoked only(bcoz, 
always the target schema will be non null).  Can you confirm if my 
understanding is right. 
   If yes, may be when we introduced the post processor we missed this flow. 
   btw, as you see I am removing the constraint. So, wanted to confirm that we 
are not breaking any other flow by allowing null for target schema. My tests in 
TestHoodieDeltaStreamer does tests all possible config knobs. but just wanted 
to be sure. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a change in pull request #2927: [HUDI-1129] Adding support to ingest records with old schema after table's schema is evolved

Reply via email to