suryaprasanna commented on code in PR #17946:
URL: https://github.com/apache/hudi/pull/17946#discussion_r2715300881


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/SourceFormatAdapter.java:
##########
@@ -206,9 +206,9 @@ public InputBatch<JavaRDD<GenericRecord>> 
fetchNewDataInAvroFormat(Option<Checkp
                     // pass in the schema for the Row-to-Avro conversion
                     // to avoid nullability mismatch between Avro schema and 
Row schema
                     ? HoodieSparkUtils.createRdd(rdd, 
HOODIE_RECORD_STRUCT_NAME, HOODIE_RECORD_NAMESPACE, true,
-                    
Option.ofNullable(r.getSchemaProvider().getSourceHoodieSchema())
-                ).toJavaRDD() : HoodieSparkUtils.createRdd(rdd,
-                    HOODIE_RECORD_STRUCT_NAME, HOODIE_RECORD_NAMESPACE, false, 
Option.empty()).toJavaRDD();
+                    
Option.ofNullable(r.getSchemaProvider().getSourceHoodieSchema())).toJavaRDD()
+                    : HoodieSparkUtils.createRdd(rdd, 
HOODIE_RECORD_STRUCT_NAME, HOODIE_RECORD_NAMESPACE, false,
+                    
Option.ofNullable(r.getSchemaProvider().getTargetHoodieSchema())).toJavaRDD();

Review Comment:
   @the-other-tim-brown  thank you for reviewing the PR.
   The **fetchNewDataInAvroFormat** method is actually called when the 
transformer is not present. So, we need targetSchema here. We are noticing a 
case where using source schema to create the JavaRDD is causing some failures. 
Let me see, if I can add a unit test.
   
https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java#L758



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to