Re: Problem using Spark with Hbase

2014-05-30 Thread Vibhor Banga
Thanks Mayur for the reply.

Actually issue was the I was running Spark application on hadoop-2.2.0 and
hbase version there was 0.95.2.

But spark by default gets build by an older hbase version. So I had to
build spark again with hbase version as 0.95.2 in spark build file. And it
worked.

Thanks,
-Vibhor


On Wed, May 28, 2014 at 11:34 PM, Mayur Rustagi mayur.rust...@gmail.com
wrote:

 Try this..

 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
 @mayur_rustagi https://twitter.com/mayur_rustagi



 On Wed, May 28, 2014 at 7:40 PM, Vibhor Banga vibhorba...@gmail.com
 wrote:

 Any one who has used spark this way or has faced similar issue, please
 help.

 Thanks,
 -Vibhor


 On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga vibhorba...@gmail.com
 wrote:

 Hi all,

 I am facing issues while using spark with HBase. I am getting
 NullPointerException at org.apache.hadoop.hbase.TableName.valueOf
 (TableName.java:288)

 Can someone please help to resolve this issue. What am I missing ?


 I am using following snippet of code -

 Configuration config = HBaseConfiguration.create();

 config.set(hbase.zookeeper.znode.parent, hostname1);
 config.set(hbase.zookeeper.quorum,hostname1);
 config.set(hbase.zookeeper.property.clientPort,2181);
 config.set(hbase.master, hostname1:
 config.set(fs.defaultFS,hdfs://hostname1/);
 config.set(dfs.namenode.rpc-address,hostname1:8020);

 config.set(TableInputFormat.INPUT_TABLE, tableName);

JavaSparkContext ctx = new JavaSparkContext(args[0], Simple,
  System.getenv(sparkHome),
 JavaSparkContext.jarOfClass(Simple.class));

JavaPairRDDImmutableBytesWritable, Result hBaseRDD
 = ctx.newAPIHadoopRDD( config, TableInputFormat.class,
 ImmutableBytesWritable.class, Result.class);

   MapImmutableBytesWritable, Result rddMap =
 hBaseRDD.collectAsMap();


 But when I go to the spark cluster and check the logs, I see following
 error -

 INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:,
 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException
 at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:154)
 at 
 org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99)
 at 
 org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:92)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84)
 at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
 at org.apache.spark.scheduler.Task.run(Task.scala:53)
 at 
 org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
 at 
 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
 at 
 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at 
 org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 Thanks,

 -Vibhor








-- 
Vibhor Banga
Software Development Engineer
Flipkart Internet Pvt. Ltd., Bangalore


Problem using Spark with Hbase

2014-05-28 Thread Vibhor Banga
Hi all,

I am facing issues while using spark with HBase. I am getting
NullPointerException at org.apache.hadoop.hbase.TableName.valueOf
(TableName.java:288)

Can someone please help to resolve this issue. What am I missing ?


I am using following snippet of code -

Configuration config = HBaseConfiguration.create();

config.set(hbase.zookeeper.znode.parent, hostname1);
config.set(hbase.zookeeper.quorum,hostname1);
config.set(hbase.zookeeper.property.clientPort,2181);
config.set(hbase.master, hostname1:
config.set(fs.defaultFS,hdfs://hostname1/);
config.set(dfs.namenode.rpc-address,hostname1:8020);

config.set(TableInputFormat.INPUT_TABLE, tableName);

   JavaSparkContext ctx = new JavaSparkContext(args[0], Simple,
 System.getenv(sparkHome),
JavaSparkContext.jarOfClass(Simple.class));

   JavaPairRDDImmutableBytesWritable, Result hBaseRDD
= ctx.newAPIHadoopRDD( config, TableInputFormat.class,
ImmutableBytesWritable.class, Result.class);

  MapImmutableBytesWritable, Result rddMap = hBaseRDD.collectAsMap();


But when I go to the spark cluster and check the logs, I see following
error -

INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:,
14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288)
at org.apache.hadoop.hbase.client.HTable.init(HTable.java:154)
at 
org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:92)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Thanks,

-Vibhor


Re: Problem using Spark with Hbase

2014-05-28 Thread Vibhor Banga
Any one who has used spark this way or has faced similar issue, please help.

Thanks,
-Vibhor

On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga vibhorba...@gmail.com wrote:

 Hi all,

 I am facing issues while using spark with HBase. I am getting
 NullPointerException at org.apache.hadoop.hbase.TableName.valueOf
 (TableName.java:288)

 Can someone please help to resolve this issue. What am I missing ?


 I am using following snippet of code -

 Configuration config = HBaseConfiguration.create();

 config.set(hbase.zookeeper.znode.parent, hostname1);
 config.set(hbase.zookeeper.quorum,hostname1);
 config.set(hbase.zookeeper.property.clientPort,2181);
 config.set(hbase.master, hostname1:
 config.set(fs.defaultFS,hdfs://hostname1/);
 config.set(dfs.namenode.rpc-address,hostname1:8020);

 config.set(TableInputFormat.INPUT_TABLE, tableName);

JavaSparkContext ctx = new JavaSparkContext(args[0], Simple,
  System.getenv(sparkHome),
 JavaSparkContext.jarOfClass(Simple.class));

JavaPairRDDImmutableBytesWritable, Result hBaseRDD
 = ctx.newAPIHadoopRDD( config, TableInputFormat.class,
 ImmutableBytesWritable.class, Result.class);

   MapImmutableBytesWritable, Result rddMap = hBaseRDD.collectAsMap();


 But when I go to the spark cluster and check the logs, I see following
 error -

 INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:,
 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException
   at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288)
   at org.apache.hadoop.hbase.client.HTable.init(HTable.java:154)
   at 
 org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99)
   at 
 org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:92)
   at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84)
   at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
   at org.apache.spark.scheduler.Task.run(Task.scala:53)
   at 
 org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
   at 
 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
   at 
 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at 
 org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)

 Thanks,

 -Vibhor