[
https://issues.apache.org/jira/browse/SPARK-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antonio Piccolboni closed SPARK-9921.
-------------------------------------
I can confirm this is resolved for this specifc test case as well, thanks
> Too many open files in Spark SQL
> --------------------------------
>
> Key: SPARK-9921
> URL: https://issues.apache.org/jira/browse/SPARK-9921
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.0
> Environment: os x
> Reporter: Antonio Piccolboni
> Assignee: Davies Liu
> Fix For: 1.5.0
>
>
> Data is table with 300K rows, 16 cols, covers a single year, so there are 12
> months and 365 days with roughly similar number of rows (each row is a
> scheduled flight)
> Error is
> Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", :
> Unable to retrieve JDBC result set for SELECT `year`, `month`, `flights`
> FROM (select `year`, `month`, sum(`flights`) as `flights`
> from (select `year`, `month`, `day`, count(*) as `flights`
> from `flights`
> group by `year`, `month`, `day`) as `_w21`
> group by `year`, `month`) AS `_w22`
> LIMIT 10 (org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0 in stage 237.0 failed 1 times, most recent failure: Lost task 0.0 in
> stage 237.0 (TID 8634, localhost): java.io.FileNotFoundException:
> /user/hive/warehouse/flights/file11ce460c958e (Too many open files)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:103)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:195)
> at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<i
> As you can see the query is not something one would write by hand very
> easily, because it's computer generated, but it makes perfect sense: it's a
> count of flights by month. Could be done without the nested query, but that's
> not the point.
> This query used to work on 1.4, doesn't on 1.5. There has also been a os
> upgrade to yosemite in the meantime, so it's hard to separate the effects of
> the two. Following suggestions that default system limits for open files are
> too low for spark to work properly, I increase hard and soft limits to 32k.
> For some reason, the error happens when java has about 10250 open files as
> reported by lsof. Not clear to me where that limit is coming from. Total
> files open is 16k. If this is not a bug, I would like to ask what a safe
> number of allowed open files is and if there are other configurations that
> need to be tuned.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]