alexeykudinkin commented on a change in pull request #4789:
URL: https://github.com/apache/hudi/pull/4789#discussion_r813383960



##########
File path: 
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/AvroConversionUtils.scala
##########
@@ -41,8 +107,8 @@ object AvroConversionUtils {
         else {
           val schema = new Schema.Parser().parse(schemaStr)
           val dataType = convertAvroSchemaToStructType(schema)
-          val convertor = AvroConversionHelper.createConverterToRow(schema, 
dataType)
-          records.map { x => convertor(x).asInstanceOf[Row] }
+          val converter = createConverterToRow(schema, dataType)

Review comment:
       @yihua you brought up legitimate concerns, there are a few 
considerations here
   
    - First of all, we already depend on `InternalRow` quite a bit (we even 
have out own `HoodieInternalRow` extension)
    - `InternalRow` is a core component of Spark that is unlikely to change 
substantially (as that would mean that quite a bit of Spark will have to be 
re-written to accommodate for any substantial changes to it, which, again, i 
don't think are likely)
   
   At the same time avoiding `InternalRow` > `Row` conversion has some 
considerable performance advantages  , and it's exactly how Spark operates 
internally: all of its internal `Plan`s, expressions, operators, operate on 
`InternalRow` and defer such deserialization (`InternalRow` to `Row`) only to 
cases when you dereference it to `RDD[Row]`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to