Navis created SPARK-12619:
-----------------------------
Summary: Combine small files in a hadoop directory into single
split
Key: SPARK-12619
URL: https://issues.apache.org/jira/browse/SPARK-12619
Project: Spark
Issue Type: Improvement
Components: Spark Core
Reporter: Navis
Priority: Trivial
When a directory contains too many (small) files, whole spark cluster will be
exhausted scheduling tasks created for each file. Custom input format can
handle that but if you're using hive metastore, it could hardly be an option.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]