[jira] [Commented] (SPARK-16188) Spark sql create a lot of small files

xianlongZhang (JIRA) Thu, 17 Aug 2017 23:03:03 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131772#comment-16131772
 ]


xianlongZhang commented on SPARK-16188:
---------------------------------------

cen yuhai,thanks for your advice, but  my company's data platform processed  
tens of thousands of sql query  every day , it is not practical to modify  each 
sql , so I think add  a common mechanism to deal with this problem is the most 
suitable solution

> Spark sql create a lot of small files
> -------------------------------------
>
>                 Key: SPARK-16188
>                 URL: https://issues.apache.org/jira/browse/SPARK-16188
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.0
>         Environment: spark 1.6.1
>            Reporter: cen yuhai
>
> I find that spark sql will create files as many as partition size. When the 
> results are small, there will be too many small files and most of them are 
> empty. 
> Hive have a function to detect the avg of file size. If  avg file size is 
> smaller than "hive.merge.smallfiles.avgsize", hive will add a job to merge 
> files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-16188) Spark sql create a lot of small files

Reply via email to