[ https://issues.apache.org/jira/browse/SPARK-9921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antonio Piccolboni closed SPARK-9921. ------------------------------------- I can confirm this is resolved for this specifc test case as well, thanks > Too many open files in Spark SQL > -------------------------------- > > Key: SPARK-9921 > URL: https://issues.apache.org/jira/browse/SPARK-9921 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Environment: os x > Reporter: Antonio Piccolboni > Assignee: Davies Liu > Fix For: 1.5.0 > > > Data is table with 300K rows, 16 cols, covers a single year, so there are 12 > months and 365 days with roughly similar number of rows (each row is a > scheduled flight) > Error is > Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : > Unable to retrieve JDBC result set for SELECT `year`, `month`, `flights` > FROM (select `year`, `month`, sum(`flights`) as `flights` > from (select `year`, `month`, `day`, count(*) as `flights` > from `flights` > group by `year`, `month`, `day`) as `_w21` > group by `year`, `month`) AS `_w22` > LIMIT 10 (org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 237.0 failed 1 times, most recent failure: Lost task 0.0 in > stage 237.0 (TID 8634, localhost): java.io.FileNotFoundException: > /user/hive/warehouse/flights/file11ce460c958e (Too many open files) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.<init>(FileInputStream.java:138) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:103) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:195) > at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<i > As you can see the query is not something one would write by hand very > easily, because it's computer generated, but it makes perfect sense: it's a > count of flights by month. Could be done without the nested query, but that's > not the point. > This query used to work on 1.4, doesn't on 1.5. There has also been a os > upgrade to yosemite in the meantime, so it's hard to separate the effects of > the two. Following suggestions that default system limits for open files are > too low for spark to work properly, I increase hard and soft limits to 32k. > For some reason, the error happens when java has about 10250 open files as > reported by lsof. Not clear to me where that limit is coming from. Total > files open is 16k. If this is not a bug, I would like to ask what a safe > number of allowed open files is and if there are other configurations that > need to be tuned. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org