The code which causes the error is: The code which causes the error is:
sc = SparkContext("local", "My App") rdd = sc.newAPIHadoopFile( name, 'org.apache.hadoop.hbase.mapreduce.TableInputFormat', 'org.apache.hadoop.hbase.io.ImmutableBytesWritable', 'org.apache.hadoop.hbase.client.Result', conf={"hbase.zookeeper.quorum": "my-host", "hbase.rootdir": "hdfs://my-host:8020/hbase", "hbase.mapreduce.inputtable": "data"}) The full stack trace is: Py4JError Traceback (most recent call last) <ipython-input-8-3b9a4ea2f659> in <module>() 7 conf={"hbase.zookeeper.quorum": "my-host", 8 "hbase.rootdir": "hdfs://my-host:8020/hbase", ----> 9 "hbase.mapreduce.inputtable": "data"}) 10 11 /opt/cloudera/parcels/CDH/lib/spark/python/pyspark/context.pyc in newAPIHadoopFile(self, name, inputformat_class, key_class, value_class, key_wrapper, value_wrapper, conf) 281 for k, v in conf.iteritems(): 282 jconf[k] = v --> 283 jrdd = self._jvm.PythonRDD.newAPIHadoopFile(self._jsc, name, inputformat_class, key_class, value_class, 284 key_wrapper, value_wrapper, jconf) 285 return RDD(jrdd, self, PickleSerializer()) /opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py in __getattr__(self, name) 657 else: 658 raise Py4JError('{0} does not exist in the JVM'. --> 659 format(self._fqn + name)) 660 661 def __call__(self, *args): Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile does not exist in the JVM -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6507.html Sent from the Apache Spark User List mailing list archive at Nabble.com.