Florian Verhein created SPARK-5331:
--------------------------------------
Summary: Tachyon workers seem to ignore tachyon.master.hostname
and use localhost instead
Key: SPARK-5331
URL: https://issues.apache.org/jira/browse/SPARK-5331
Project: Spark
Issue Type: Bug
Environment: Running on EC2 via modified spark-ec2 scripts (to get
dependencies right so tachyon starts)
Using tachyon 0.5.0 built against hadoop 2.4.1
Spark 1.2.0 built against tachyon 0.5.0 and hadoop 0.4.1
Tachyon configured using the template in 0.5.0 but updated with slave list and
master variables etc..
Reporter: Florian Verhein
ps -ef | grep Tachyon
shows Tachyon running on the master (and the slave) node with correct setting:
-Dtachyon.master.hostname=ec2-54-252-156-187.ap-southeast-2.compute.amazonaws.com
However from stderr log on worker running the SparkTachyonPi example:
15/01/20 06:00:56 INFO CacheManager: Partition rdd_0_0 not found, computing it
15/01/20 06:00:56 INFO : Trying to connect master @ localhost/127.0.0.1:19998
15/01/20 06:00:56 ERROR : Failed to connect (1) to master
localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused
15/01/20 06:00:57 ERROR : Failed to connect (2) to master
localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused
15/01/20 06:00:58 ERROR : Failed to connect (3) to master
localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused
15/01/20 06:00:59 ERROR : Failed to connect (4) to master
localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused
15/01/20 06:01:00 ERROR : Failed to connect (5) to master
localhost/127.0.0.1:19998 : java.net.ConnectException: Connection refused
15/01/20 06:01:01 WARN TachyonBlockManager: Attempt 1 to create tachyon dir
null failed
java.io.IOException: Failed to connect to master localhost/127.0.0.1:19998
after 5 attempts
at tachyon.client.TachyonFS.connect(TachyonFS.java:293)
at tachyon.client.TachyonFS.getFileId(TachyonFS.java:1011)
at tachyon.client.TachyonFS.exist(TachyonFS.java:633)
at
org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:117)
at
org.apache.spark.storage.TachyonBlockManager$$anonfun$createTachyonDirs$2.apply(TachyonBlockManager.scala:106)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at
org.apache.spark.storage.TachyonBlockManager.createTachyonDirs(TachyonBlockManager.scala:106)
at
org.apache.spark.storage.TachyonBlockManager.<init>(TachyonBlockManager.scala:57)
at
org.apache.spark.storage.BlockManager.tachyonStore$lzycompute(BlockManager.scala:94)
at
org.apache.spark.storage.BlockManager.tachyonStore(BlockManager.scala:88)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:773)
at
org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
at
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:145)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: tachyon.org.apache.thrift.TException: Failed to connect to master
localhost/127.0.0.1:19998 after 5 attempts
at tachyon.master.MasterClient.connect(MasterClient.java:178)
at tachyon.client.TachyonFS.connect(TachyonFS.java:290)
... 28 more
Caused by: tachyon.org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection refused
at tachyon.org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at
tachyon.org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at tachyon.master.MasterClient.connect(MasterClient.java:156)
... 29 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at tachyon.org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 31 more
15/01/20 06:01:01 ERROR TachyonBlockManager: Failed 10 attempts to create
tachyon dir in /tmp_spark_tachyon/spark-f1c7257e-b79e-4fa2-955a-e3734d80dbc6/1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]