[ https://issues.apache.org/jira/browse/SPARK-32531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Muhammad Samir Khan updated SPARK-32531: ---------------------------------------- Description: We had found that Spark performance was slow as compared to PIG on some schemas in our pipelines. On investigation, it was found that Spark performance was slow for nested structs and array'd structs and these cases were not being profiled by the current benchmarks. I have some improvements for ORC (SPARK-32532) and Avro (SPARK-32533) file formats which improve the performance in these cases and will be putting up the PRs soon. (was: Additions to benchmarks for different file formats for nested structs and arrays which are not being currently benchmarked. I have some improvements for ORC and Avro file formats which improve the performance in these cases. I will be putting up the PRs soon.) > Add benchmarks for nested structs and arrays for different file formats > ----------------------------------------------------------------------- > > Key: SPARK-32531 > URL: https://issues.apache.org/jira/browse/SPARK-32531 > Project: Spark > Issue Type: Test > Components: SQL > Affects Versions: 3.0.0 > Reporter: Muhammad Samir Khan > Priority: Major > > We had found that Spark performance was slow as compared to PIG on some > schemas in our pipelines. On investigation, it was found that Spark > performance was slow for nested structs and array'd structs and these cases > were not being profiled by the current benchmarks. I have some improvements > for ORC (SPARK-32532) and Avro (SPARK-32533) file formats which improve the > performance in these cases and will be putting up the PRs soon. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org