Navis created SPARK-12619:
-----------------------------

             Summary: Combine small files in a hadoop directory into single 
split 
                 Key: SPARK-12619
                 URL: https://issues.apache.org/jira/browse/SPARK-12619
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
            Reporter: Navis
            Priority: Trivial


When a directory contains too many (small) files, whole spark cluster will be 
exhausted scheduling tasks created for each file. Custom input format can 
handle that but if you're using hive metastore, it could hardly be an option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to