[ 
https://issues.apache.org/jira/browse/SPARK-32531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muhammad Samir Khan updated SPARK-32531:
----------------------------------------
    Description: We had found that Spark performance was slow as compared to 
PIG on some schemas in our pipelines. On investigation, it was found that Spark 
performance was slow for nested structs and array'd structs and these cases 
were not being profiled by the current benchmarks. I have some improvements for 
ORC (SPARK-32532) and Avro (SPARK-32533) file formats which improve the 
performance in these cases and will be putting up the PRs soon.  (was: 
Additions to benchmarks for different file formats for nested structs and 
arrays which are not being currently benchmarked. I have some improvements for 
ORC and Avro file formats which improve the performance in these cases.

I will be putting up the PRs soon.)

> Add benchmarks for nested structs and arrays for different file formats
> -----------------------------------------------------------------------
>
>                 Key: SPARK-32531
>                 URL: https://issues.apache.org/jira/browse/SPARK-32531
>             Project: Spark
>          Issue Type: Test
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Muhammad Samir Khan
>            Priority: Major
>
> We had found that Spark performance was slow as compared to PIG on some 
> schemas in our pipelines. On investigation, it was found that Spark 
> performance was slow for nested structs and array'd structs and these cases 
> were not being profiled by the current benchmarks. I have some improvements 
> for ORC (SPARK-32532) and Avro (SPARK-32533) file formats which improve the 
> performance in these cases and will be putting up the PRs soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to