[jira] [Created] (SPARK-16219) Unable to run Python wordcount

Chaitanya (JIRA) Sun, 26 Jun 2016 20:47:28 -0700

Chaitanya created SPARK-16219:
---------------------------------

             Summary: Unable to run Python wordcount
                 Key: SPARK-16219
                 URL: https://issues.apache.org/jira/browse/SPARK-16219
             Project: Spark
          Issue Type: Test
          Components: Examples
    Affects Versions: 1.6.1
         Environment: Ubuntu 16.04 LTS
            Reporter: Chaitanya
             Fix For: 1.6.1



I was trying to run the example in Spark. I started with pi estimation and it 
worked fine for me. Then I tried wordcount with the following command:-

./bin/spark-submit examples/src/main/python/wordcount.py 
/home/chaitanya/Desktop/dataset/books_17.txt

and I got the following error:-

16/06/27 11:15:43 INFO spark.SparkContext: Running Spark version 1.6.1
16/06/27 11:15:43 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
16/06/27 11:15:43 WARN util.Utils: Your hostname, ubuntu resolves to a loopback 
address: 127.0.1.1; using 192.168.88.128 instead (on interface ens33)
16/06/27 11:15:43 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to 
another address
16/06/27 11:15:43 INFO spark.SecurityManager: Changing view acls to: chaitanya
16/06/27 11:15:43 INFO spark.SecurityManager: Changing modify acls to: chaitanya
16/06/27 11:15:43 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(chaitanya); users 
with modify permissions: Set(chaitanya)
16/06/27 11:15:43 INFO util.Utils: Successfully started service 'sparkDriver' 
on port 40782.
16/06/27 11:15:44 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/06/27 11:15:44 INFO Remoting: Starting remoting
16/06/27 11:15:44 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://[email protected]:34918]
16/06/27 11:15:44 INFO util.Utils: Successfully started service 
'sparkDriverActorSystem' on port 34918.
16/06/27 11:15:44 INFO spark.SparkEnv: Registering MapOutputTracker
16/06/27 11:15:44 INFO spark.SparkEnv: Registering BlockManagerMaster
16/06/27 11:15:44 INFO storage.DiskBlockManager: Created local directory at 
/tmp/blockmgr-e89416c4-ab0e-433e-bc80-3b8e5d7ebf96
16/06/27 11:15:44 INFO storage.MemoryStore: MemoryStore started with capacity 
511.5 MB
16/06/27 11:15:44 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/06/27 11:15:44 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/06/27 11:15:44 INFO server.AbstractConnector: Started 
[email protected]:4040
16/06/27 11:15:44 INFO util.Utils: Successfully started service 'SparkUI' on 
port 4040.
16/06/27 11:15:44 INFO ui.SparkUI: Started SparkUI at http://192.168.88.128:4040
16/06/27 11:15:44 INFO util.Utils: Copying 
/opt/spark-1.6.1-bin-without-hadoop/examples/src/main/python/wordcount.py to 
/tmp/spark-496bd7b2-af0b-49ee-b644-d270aa58b04e/userFiles-f1a12cae-8430-44b8-8897-8f18871c501b/wordcount.py
16/06/27 11:15:44 INFO spark.SparkContext: Added file 
file:/opt/spark-1.6.1-bin-without-hadoop/examples/src/main/python/wordcount.py 
at 
file:/opt/spark-1.6.1-bin-without-hadoop/examples/src/main/python/wordcount.py 
with timestamp 1467026144750
16/06/27 11:15:44 INFO executor.Executor: Starting executor ID driver on host 
localhost
16/06/27 11:15:44 INFO util.Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 37082.
16/06/27 11:15:44 INFO netty.NettyBlockTransferService: Server created on 37082
16/06/27 11:15:44 INFO storage.BlockManagerMaster: Trying to register 
BlockManager
16/06/27 11:15:44 INFO storage.BlockManagerMasterEndpoint: Registering block 
manager localhost:37082 with 511.5 MB RAM, BlockManagerId(driver, localhost, 
37082)
16/06/27 11:15:44 INFO storage.BlockManagerMaster: Registered BlockManager
16/06/27 11:15:45 INFO storage.MemoryStore: Block broadcast_0 stored as values 
in memory (estimated size 189.2 KB, free 189.2 KB)
16/06/27 11:15:45 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as 
bytes in memory (estimated size 21.5 KB, free 210.7 KB)
16/06/27 11:15:45 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
memory on localhost:37082 (size: 21.5 KB, free: 511.5 MB)
16/06/27 11:15:45 INFO spark.SparkContext: Created broadcast 0 from textFile at 
NativeMethodAccessorImpl.java:-2
Traceback (most recent call last):
  File 
"/opt/spark-1.6.1-bin-without-hadoop/examples/src/main/python/wordcount.py", 
line 34, in <module>
    .reduceByKey(add)
  File 
"/opt/spark-1.6.1-bin-without-hadoop/python/lib/pyspark.zip/pyspark/rdd.py", 
line 1558, in reduceByKey
  File 
"/opt/spark-1.6.1-bin-without-hadoop/python/lib/pyspark.zip/pyspark/rdd.py", 
line 1768, in combineByKey
  File 
"/opt/spark-1.6.1-bin-without-hadoop/python/lib/pyspark.zip/pyspark/rdd.py", 
line 2169, in _defaultReducePartitions
  File 
"/opt/spark-1.6.1-bin-without-hadoop/python/lib/pyspark.zip/pyspark/rdd.py", 
line 2363, in getNumPartitions
  File 
"/opt/spark-1.6.1-bin-without-hadoop/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
 line 813, in __call__
  File 
"/opt/spark-1.6.1-bin-without-hadoop/python/lib/py4j-0.9-src.zip/py4j/protocol.py",
 line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o18.partitions.
: java.net.ConnectException: Call From ubuntu/127.0.1.1 to localhost:9000 
failed on connection exception: java.net.ConnectException: Connection refused; 
For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
        at org.apache.hadoop.ipc.Client.call(Client.java:1480)
        at org.apache.hadoop.ipc.Client.call(Client.java:1407)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy20.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1674)
        at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259)
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
        at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
        at 
org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:64)
        at 
org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:46)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
        at org.apache.hadoop.ipc.Client.call(Client.java:1446)
        ... 45 more

16/06/27 11:15:45 INFO spark.SparkContext: Invoking stop() from shutdown hook
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/metrics/json,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/api,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/static,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/json,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/environment/json,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/environment,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/json,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/pool,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/stage,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/json,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs/job,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs/json,null}
16/06/27 11:15:45 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs,null}
16/06/27 11:15:45 INFO ui.SparkUI: Stopped Spark web UI at 
http://192.168.88.128:4040
16/06/27 11:15:45 INFO spark.MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
16/06/27 11:15:45 INFO storage.MemoryStore: MemoryStore cleared
16/06/27 11:15:45 INFO storage.BlockManager: BlockManager stopped
16/06/27 11:15:45 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
16/06/27 11:15:45 INFO 
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
16/06/27 11:15:45 INFO spark.SparkContext: Successfully stopped SparkContext
16/06/27 11:15:45 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Shutting down remote daemon.
16/06/27 11:15:45 INFO util.ShutdownHookManager: Shutdown hook called
16/06/27 11:15:45 INFO util.ShutdownHookManager: Deleting directory 
/tmp/spark-496bd7b2-af0b-49ee-b644-d270aa58b04e/pyspark-66f863e9-62df-4b97-a41b-c0b670d297e3
16/06/27 11:15:45 INFO util.ShutdownHookManager: Deleting directory 
/tmp/spark-496bd7b2-af0b-49ee-b644-d270aa58b04e
16/06/27 11:15:45 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote 
daemon shut down; proceeding with flushing remote transports.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-16219) Unable to run Python wordcount

Reply via email to