Navis created SPARK-12619: ----------------------------- Summary: Combine small files in a hadoop directory into single split Key: SPARK-12619 URL: https://issues.apache.org/jira/browse/SPARK-12619 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Navis Priority: Trivial
When a directory contains too many (small) files, whole spark cluster will be exhausted scheduling tasks created for each file. Custom input format can handle that but if you're using hive metastore, it could hardly be an option. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org