Re: Python, Spark and HBase

Nick Pentreath Thu, 29 May 2014 03:38:32 -0700

Hi Tommer,

I'm working on updating and improving the PR, and will work on getting an
HBase example working with it. Will feed back as soon as I have had the
chance to work on this a bit more.


N


On Thu, May 29, 2014 at 3:27 AM, twizansk <twiza...@gmail.com> wrote:

> The code which causes the error is:
>
> The code which causes the error is:
>
> sc = SparkContext("local", "My App")
> rdd = sc.newAPIHadoopFile(
>     name,
>     'org.apache.hadoop.hbase.mapreduce.TableInputFormat',
>     'org.apache.hadoop.hbase.io.ImmutableBytesWritable',
>     'org.apache.hadoop.hbase.client.Result',
>     conf={"hbase.zookeeper.quorum": "my-host",
>       "hbase.rootdir": "hdfs://my-host:8020/hbase",
>       "hbase.mapreduce.inputtable": "data"})
>
> The full stack trace is:
>
>
>
> Py4JError                                 Traceback (most recent call last)
> <ipython-input-8-3b9a4ea2f659> in <module>()
>       7 conf={"hbase.zookeeper.quorum": "my-host",
>       8       "hbase.rootdir": "hdfs://my-host:8020/hbase",
> ----> 9       "hbase.mapreduce.inputtable": "data"})
>      10
>      11
>
> /opt/cloudera/parcels/CDH/lib/spark/python/pyspark/context.pyc in
> newAPIHadoopFile(self, name, inputformat_class, key_class, value_class,
> key_wrapper, value_wrapper, conf)
>     281         for k, v in conf.iteritems():
>     282             jconf[k] = v
> --> 283         jrdd = self._jvm.PythonRDD.newAPIHadoopFile(self._jsc,
> name,
> inputformat_class, key_class, value_class,
>     284                                                     key_wrapper,
> value_wrapper, jconf)
>     285         return RDD(jrdd, self, PickleSerializer())
>
>
> /opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py
> in __getattr__(self, name)
>     657         else:
>     658             raise Py4JError('{0} does not exist in the JVM'.
> --> 659                     format(self._fqn + name))
>     660
>     661     def __call__(self, *args):
>
> Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile does not
> exist in the JVM
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6507.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Python, Spark and HBase

Reply via email to