nsivabalan commented on code in PR #17946:
URL: https://github.com/apache/hudi/pull/17946#discussion_r2761074502


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/SourceFormatAdapter.java:
##########
@@ -206,9 +206,9 @@ public InputBatch<JavaRDD<GenericRecord>> 
fetchNewDataInAvroFormat(Option<Checkp
                     // pass in the schema for the Row-to-Avro conversion
                     // to avoid nullability mismatch between Avro schema and 
Row schema
                     ? HoodieSparkUtils.createRdd(rdd, 
HOODIE_RECORD_STRUCT_NAME, HOODIE_RECORD_NAMESPACE, true,
-                    
Option.ofNullable(r.getSchemaProvider().getSourceHoodieSchema())
-                ).toJavaRDD() : HoodieSparkUtils.createRdd(rdd,
-                    HOODIE_RECORD_STRUCT_NAME, HOODIE_RECORD_NAMESPACE, false, 
Option.empty()).toJavaRDD();
+                    
Option.ofNullable(r.getSchemaProvider().getSourceHoodieSchema())).toJavaRDD()
+                    : HoodieSparkUtils.createRdd(rdd, 
HOODIE_RECORD_STRUCT_NAME, HOODIE_RECORD_NAMESPACE, false,
+                    
Option.ofNullable(r.getSchemaProvider().getTargetHoodieSchema())).toJavaRDD();

Review Comment:
   After some thoughts, here is what I think we can do. 
   We do not want the `SourceFormatAdapter` to own the decision on which schema 
to use. It has to be from the caller.
   
   we can add a new argument to this method 
   ```
   Functions.Function1<SchemaProvider, HoodieSchema> schemaToUseFunc
   ```
   
   and at L217, we can do 
   ```
   Option.ofNullable(schemaToUseFunc.apply(r.getSchemaProvider()))
   ```
   
   
   and from caller's end, 
   ```
   formatAdapter.fetchNewDataInAvroFormat(resumeCheckpoint, cfg.sourceLimit,
               (Functions.Function1<SchemaProvider, HoodieSchema>) schemaProv 
-> schemaProv.getTargetHoodieSchema());
   ```
   
   



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/SourceFormatAdapter.java:
##########
@@ -206,9 +206,9 @@ public InputBatch<JavaRDD<GenericRecord>> 
fetchNewDataInAvroFormat(Option<Checkp
                     // pass in the schema for the Row-to-Avro conversion
                     // to avoid nullability mismatch between Avro schema and 
Row schema
                     ? HoodieSparkUtils.createRdd(rdd, 
HOODIE_RECORD_STRUCT_NAME, HOODIE_RECORD_NAMESPACE, true,
-                    
Option.ofNullable(r.getSchemaProvider().getSourceHoodieSchema())
-                ).toJavaRDD() : HoodieSparkUtils.createRdd(rdd,
-                    HOODIE_RECORD_STRUCT_NAME, HOODIE_RECORD_NAMESPACE, false, 
Option.empty()).toJavaRDD();
+                    
Option.ofNullable(r.getSchemaProvider().getSourceHoodieSchema())).toJavaRDD()
+                    : HoodieSparkUtils.createRdd(rdd, 
HOODIE_RECORD_STRUCT_NAME, HOODIE_RECORD_NAMESPACE, false,
+                    
Option.ofNullable(r.getSchemaProvider().getTargetHoodieSchema())).toJavaRDD();

Review Comment:
   I also see we use sourceSchema for PROTO as well. 
   can you check if we need to fix that as well 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to