alexeykudinkin commented on PR #5470:
URL: https://github.com/apache/hudi/pull/5470#issuecomment-1194362557
TL;DR is the difference b/w `Row` and `InternalRow`:
- When you do `df.rdd` you invoke deserializer which will deserialize
internal binary representation (`UnsafeRow`) into a `Row` holding Java native
types (it also holds the schema)
- `df.queryExecution.toRdd` is an internal API that returns you an RDD of
`InternalRow`s avoiding such conversion (that’s the primary reason for
introduction of many utilities in `HoodieUnsafeUtils` to be able to access
private Spark APIs)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]