[ 
https://issues.apache.org/jira/browse/SPARK-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antonio Piccolboni closed SPARK-9921.
-------------------------------------

I can confirm this is resolved for this specifc test case as well, thanks

> Too many open files in Spark SQL
> --------------------------------
>
>                 Key: SPARK-9921
>                 URL: https://issues.apache.org/jira/browse/SPARK-9921
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>         Environment: os x
>            Reporter: Antonio Piccolboni
>            Assignee: Davies Liu
>             Fix For: 1.5.0
>
>
> Data is table with 300K rows, 16 cols, covers a single year, so there are 12 
> months and 365 days with roughly similar number of rows (each row is a 
> scheduled flight)
> Error is
> Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ",  : 
>   Unable to retrieve JDBC result set for SELECT `year`, `month`, `flights`
> FROM (select `year`, `month`, sum(`flights`) as `flights`
> from (select `year`, `month`, `day`, count(*) as `flights`
> from `flights`
> group by `year`, `month`, `day`) as `_w21`
> group by `year`, `month`) AS `_w22`
> LIMIT 10 (org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 237.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 237.0 (TID 8634, localhost): java.io.FileNotFoundException: 
> /user/hive/warehouse/flights/file11ce460c958e (Too many open files)
>       at java.io.FileInputStream.open0(Native Method)
>       at java.io.FileInputStream.open(FileInputStream.java:195)
>       at java.io.FileInputStream.<init>(FileInputStream.java:138)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:103)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:195)
>       at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<i
> As you can see the query is not something one would write by hand very 
> easily, because it's computer generated, but it makes perfect sense: it's a 
> count of flights by month. Could be done without the nested query, but that's 
> not the point.
> This query used to work on 1.4, doesn't on 1.5. There has also been a os 
> upgrade to yosemite in the meantime, so it's hard to separate the effects of 
> the two. Following suggestions that default system limits for open files are 
> too low for spark to work properly, I increase hard and soft limits to 32k. 
> For some reason,  the error happens when java has about 10250 open files as 
> reported by lsof. Not clear to me where that limit is coming from. Total 
> files open is 16k. If this is not a bug, I would like to ask what a safe 
> number of allowed open files is and if there are other configurations that 
> need to be tuned. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to