steveloughran commented on issue #15100: URL: https://github.com/apache/iceberg/issues/15100#issuecomment-3805316166
I did an audit via mvnrepository.com, rather than actually look what you get in every spark distro, though I did compare my local spark 4.1.1 installation with what the maven repo said -all consistent. Maven chooses versions based on distance from root of the tree, so it can be a bit nondeterministic there. Looking at core libraries: hadoop, avro, parquet and orc, with jackson noted not for api but general stability. Overall I'd propose * Downgrade Hadoop to 3.3.4 for spark 3.4.4 compatibility; flag in libs.versions the reason for this. * Upgrade arrow to 18.1.0 to be consistent with all spark releases that ship it. * Stay on high avro and parquet for security reasons. The pain points there is any generated avro classes are incompatible with other runtimes. I'll do a PR for this issue with the hadoop downgrade, and a separate one for Arrow, after creating a separate issue for it. ### iceberg ``` arrow 15.0.2 avro 1.12.1 hadoop 3.4.2 jackson 2.20.1 orc 1.9.8 parquet 1.17.0 ``` ### spark 3.4.4 https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/3.4.4/dependencies https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/3.4.4/dependencies ``` hadoop 3.3.4 avro 1.11.1 jackson 2.14.2 parquet 1.12.3 orc-core 1.8.7 ``` older: hadoop, parquet, avro, orc missing: arrow ### spark 3.5.8 https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/3.5.8/dependencies https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/3.5.8/dependencies ``` avro 1.11.5 hadoop 3.3.4 jackson 2.15.2 orc core 1.9.8 parquet 1.13.1 ``` older: hadoop, parquet, avro sync: orc newer: missing: arrow ### spark 4.0.1 https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/4.0.1/dependencies https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/4.0.1/dependencies ``` arrow-format 18.1.0 avro 1.12.0 hadoop 3.4.1 jackson 2.18.2 orc core 2.1.3 parquet 1.15.2 ``` older: hadoop, parquet, avro sync: newer: orc, arrow ### spark 4.1.1 https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/4.1.1/dependencies https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/4.1.1/dependencies ``` arrow-format 18.3.0 avro 1.12.1 hadoop 3.4.2 orc 2.2.1 parquet 1.16.0 ``` older: parquet sync: hadoop, avro newer: orc, arrow ### Kafka 3.9.1 https://mvnrepository.com/artifact/org.apache.kafka/kafka_2.13/3a.9.1/dependencies jackson 2.16.2 ### Flink 1.20.1 https://mvnrepository.com/artifact/org.apache.flink/flink-runtime/1.20.1/dependencies Hadoop 2.10.2 (optional) jackson 2.14.2-17.0 (shaded) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
