Jon Chase created SPARK-6962:
--------------------------------
Summary: Spark gets stuck on a step, hangs forever - jobs do not
complete
Key: SPARK-6962
URL: https://issues.apache.org/jira/browse/SPARK-6962
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.3.0, 1.2.1, 1.2.0
Reporter: Jon Chase
Priority: Blocker
Spark SQL queries (though this seems to be a Spark Core issue - I'm just using
queries in the REPL to surface this, so I mention Spark SQL) hang indefinitely
under certain (not totally understood) circumstances.
This is resolved by setting spark.shuffle.blockTransferService=nio, which seems
to point to netty as the issue. Netty was set as the default for the block
transport layer in 1.2.0, which is when this issue started. Setting the
service to nio allows queries to complete normally.
I do not see this problem when running queries over smaller (~20 5MB files)
datasets. When I increase the scope to include more data (several hundred ~5MB
files), the queries will get through several steps but eventuall hang
indefinitely.
Here's the email chain regarding this issue, including stack traces:
http://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/<cae61spfqt2y7d5vqzomzz2dmr-jx2c2zggcyky40npkjjx4...@mail.gmail.com>
For context, here's the announcement regarding the block transfer service
change:
http://mail-archives.apache.org/mod_mbox/spark-dev/201411.mbox/<cabpqxssl04q+rbltp-d8w+z3atn+g-um6gmdgdnh-hzcvd-...@mail.gmail.com>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]