steveloughran commented on issue #15100:
URL: https://github.com/apache/iceberg/issues/15100#issuecomment-3805316166

   I did an audit via mvnrepository.com, rather than actually look what you get 
in every spark distro, though I did compare my local spark 4.1.1 installation 
with what the maven repo said -all consistent. Maven chooses versions based on 
distance from root of the tree, so it can be a bit nondeterministic there.
   
   Looking at core libraries: hadoop, avro, parquet and orc, with jackson noted 
not for api but general stability.
   
   Overall I'd propose
   * Downgrade Hadoop to 3.3.4 for spark 3.4.4 compatibility; flag in 
libs.versions the reason for this.
   * Upgrade arrow to 18.1.0 to be consistent with all spark releases that ship 
it.
   * Stay on high avro and parquet for security reasons. The pain points there 
is any generated avro classes are incompatible with other runtimes. 
   
   I'll do a PR for this issue with the hadoop downgrade, and a separate one for
   Arrow, after creating a separate issue for it.
   
   ### iceberg 
   
   ```
   arrow 15.0.2
   avro 1.12.1
   hadoop 3.4.2
   jackson 2.20.1
   orc 1.9.8
   parquet 1.17.0
   ```
   
   ### spark 3.4.4
   
   
https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/3.4.4/dependencies
   
https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/3.4.4/dependencies
   
   ```
   hadoop 3.3.4
   avro 1.11.1
   jackson 2.14.2
   parquet 1.12.3
   orc-core 1.8.7
   ```
   
   older: hadoop, parquet, avro, orc
   missing: arrow
   
   ### spark 3.5.8
   
   
https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/3.5.8/dependencies
   
https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/3.5.8/dependencies
   
   ```
   avro 1.11.5
   hadoop 3.3.4
   jackson 2.15.2
   orc core 1.9.8
   parquet 1.13.1
   ```
   
   older: hadoop, parquet, avro
   sync: orc
   newer:
   missing: arrow
   
   ### spark 4.0.1
   
   
https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/4.0.1/dependencies
   
https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/4.0.1/dependencies
   
   ```
   arrow-format 18.1.0
   avro 1.12.0
   hadoop 3.4.1
   jackson 2.18.2
   orc core 2.1.3
   parquet 1.15.2
   ```
   
   older: hadoop, parquet, avro
   sync: 
   newer: orc, arrow
   
   ### spark 4.1.1
   
   
https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/4.1.1/dependencies
   
https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/4.1.1/dependencies
   
   ```
   arrow-format 18.3.0
   avro 1.12.1
   hadoop 3.4.2
   orc 2.2.1
   parquet 1.16.0
   ```
   
   older: parquet
   sync: hadoop, avro
   newer: orc, arrow
   
   ### Kafka 3.9.1
   
   
https://mvnrepository.com/artifact/org.apache.kafka/kafka_2.13/3a.9.1/dependencies
   
   jackson 2.16.2
   
   ### Flink 1.20.1
   
   
https://mvnrepository.com/artifact/org.apache.flink/flink-runtime/1.20.1/dependencies
   
   Hadoop 2.10.2 (optional)
   jackson 2.14.2-17.0 (shaded)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to