[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6227: [HUDI-4496] Fixing Orc support broken for Spark 3.x and more

GitBox Mon, 01 Aug 2022 11:14:35 -0700


alexeykudinkin commented on code in PR #6227:
URL: https://github.com/apache/hudi/pull/6227#discussion_r934791644



##########
hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala:
##########
@@ -223,6 +215,20 @@ private[sql] class AvroSerializer(
         val numFields = st.length
         (getter, ordinal) => structConverter(getter.getStruct(ordinal, 
numFields))
 
+      
////////////////////////////////////////////////////////////////////////////////////////////
+      // Following section is amended to the original (Spark's) implementation
+      // >>> BEGINS
+      
////////////////////////////////////////////////////////////////////////////////////////////
+
+      case (st: StructType, UNION) =>

Review Comment:
   Folks, we need to be careful when we add support for Spark 3.3 we should 
review the changes against previous versions b/c some of the code we re-home 
from Spark has actually our own fixes which we can't upstream to Spark in a 
timely fashion.
   
   This change in particular if missed would have made Column Stats Index 
unreadable on Spark 3.3. 
   I had added annotations like you can see here to make it more catchy to the 
eye that this code isn't just a verbatim copy from Spark, but we need to be 
careful when we try to back-port the changes. 
   
   I'd recommend following procedure: 
   
   1. By default, carry over implementation from previous Spark version.
   2. Then reconcile it w/ the latest Spark version (back-porting)
   
   
   cc @CTTY @rahil-c @umehrot2 @nsivabalan @yihua @codope @xushiyan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6227: [HUDI-4496] Fixing Orc support broken for Spark 3.x and more

Reply via email to