alexeykudinkin commented on a change in pull request #4789:
URL: https://github.com/apache/hudi/pull/4789#discussion_r813383960
##########
File path:
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/AvroConversionUtils.scala
##########
@@ -41,8 +107,8 @@ object AvroConversionUtils {
else {
val schema = new Schema.Parser().parse(schemaStr)
val dataType = convertAvroSchemaToStructType(schema)
- val convertor = AvroConversionHelper.createConverterToRow(schema,
dataType)
- records.map { x => convertor(x).asInstanceOf[Row] }
+ val converter = createConverterToRow(schema, dataType)
Review comment:
@yihua you brought up legitimate concerns, there are a few
considerations here
- First of all, we already depend on `InternalRow` quite a bit (we even
have out own `HoodieInternalRow` extension)
- `InternalRow` is a core component of Spark that is unlikely to change
substantially (as that would mean that quite a bit of Spark will have to be
re-written to accommodate for any substantial changes to it, which, again, i
don't think are likely)
At the same time avoiding `InternalRow` > `Row` conversion has some
considerable performance advantages , and it's exactly how Spark operates
internally: all of its internal `Plan`s, expressions, operators, operate on
`InternalRow` and defer such deserialization (`InternalRow` to `Row`) only to
cases when you dereference it to `RDD[Row]`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]