alexeykudinkin commented on code in PR #6227:
URL: https://github.com/apache/hudi/pull/6227#discussion_r934791644
##########
hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala:
##########
@@ -223,6 +215,20 @@ private[sql] class AvroSerializer(
val numFields = st.length
(getter, ordinal) => structConverter(getter.getStruct(ordinal,
numFields))
+
////////////////////////////////////////////////////////////////////////////////////////////
+ // Following section is amended to the original (Spark's) implementation
+ // >>> BEGINS
+
////////////////////////////////////////////////////////////////////////////////////////////
+
+ case (st: StructType, UNION) =>
Review Comment:
Folks, we need to be careful when we add support for Spark 3.3 we should
review the changes against previous versions b/c some of the code we re-home
from Spark has actually our own fixes which we can't upstream to Spark in a
timely fashion.
This change in particular if missed would have made Column Stats Index
unreadable on Spark 3.3.
I had added annotations like you can see here to make it more catchy to the
eye that this code isn't just a verbatim copy from Spark, but we need to be
careful when we try to back-port the changes.
I'd recommend following procedure:
1. By default, carry over implementation from previous Spark version.
2. Then reconcile it w/ the latest Spark version (back-porting)
cc @CTTY @rahil-c @umehrot2 @nsivabalan @yihua @codope @xushiyan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]