[jira] [Commented] (SPARK-28376) Support to write sorted parquet files in each row group

Ryan Blue (JIRA) Mon, 15 Jul 2019 10:30:20 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-28376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885428#comment-16885428
 ]


Ryan Blue commented on SPARK-28376:
-----------------------------------

I don't think this is a regression. The linked issue was to automatically add 
repartitioning to the SQL plan to avoid too many files, even with a local sort. 
I think that this is no longer needed because we plan to do it in DSv2.

> Support to write sorted parquet files in each row group
> -------------------------------------------------------
>
>                 Key: SPARK-28376
>                 URL: https://issues.apache.org/jira/browse/SPARK-28376
>             Project: Spark
>          Issue Type: New Feature
>          Components: Input/Output, Spark Core
>    Affects Versions: 2.4.3
>            Reporter: t oo
>            Priority: Major
>
> this is for the ability to writeee parquet with sorteed values in each 
> rowgroup
>  
> see 
> [https://stackoverflow.com/questions/52159938/cant-write-ordered-data-to-parquet-in-spark]
> [https://www.slideshare.net/RyanBlue3/parquet-performance-tuning-the-missing-guide]
>  (slidee 26-27)
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28376) Support to write sorted parquet files in each row group

Reply via email to