Re: Problem using Spark with Hbase
Thanks Mayur for the reply. Actually issue was the I was running Spark application on hadoop-2.2.0 and hbase version there was 0.95.2. But spark by default gets build by an older hbase version. So I had to build spark again with hbase version as 0.95.2 in spark build file. And it worked. Thanks, -Vibhor On Wed, May 28, 2014 at 11:34 PM, Mayur Rustagi mayur.rust...@gmail.com wrote: Try this.. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Wed, May 28, 2014 at 7:40 PM, Vibhor Banga vibhorba...@gmail.com wrote: Any one who has used spark this way or has faced similar issue, please help. Thanks, -Vibhor On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga vibhorba...@gmail.com wrote: Hi all, I am facing issues while using spark with HBase. I am getting NullPointerException at org.apache.hadoop.hbase.TableName.valueOf (TableName.java:288) Can someone please help to resolve this issue. What am I missing ? I am using following snippet of code - Configuration config = HBaseConfiguration.create(); config.set(hbase.zookeeper.znode.parent, hostname1); config.set(hbase.zookeeper.quorum,hostname1); config.set(hbase.zookeeper.property.clientPort,2181); config.set(hbase.master, hostname1: config.set(fs.defaultFS,hdfs://hostname1/); config.set(dfs.namenode.rpc-address,hostname1:8020); config.set(TableInputFormat.INPUT_TABLE, tableName); JavaSparkContext ctx = new JavaSparkContext(args[0], Simple, System.getenv(sparkHome), JavaSparkContext.jarOfClass(Simple.class)); JavaPairRDDImmutableBytesWritable, Result hBaseRDD = ctx.newAPIHadoopRDD( config, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); MapImmutableBytesWritable, Result rddMap = hBaseRDD.collectAsMap(); But when I go to the spark cluster and check the logs, I see following error - INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:, 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:154) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:92) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Thanks, -Vibhor -- Vibhor Banga Software Development Engineer Flipkart Internet Pvt. Ltd., Bangalore
Problem using Spark with Hbase
Hi all, I am facing issues while using spark with HBase. I am getting NullPointerException at org.apache.hadoop.hbase.TableName.valueOf (TableName.java:288) Can someone please help to resolve this issue. What am I missing ? I am using following snippet of code - Configuration config = HBaseConfiguration.create(); config.set(hbase.zookeeper.znode.parent, hostname1); config.set(hbase.zookeeper.quorum,hostname1); config.set(hbase.zookeeper.property.clientPort,2181); config.set(hbase.master, hostname1: config.set(fs.defaultFS,hdfs://hostname1/); config.set(dfs.namenode.rpc-address,hostname1:8020); config.set(TableInputFormat.INPUT_TABLE, tableName); JavaSparkContext ctx = new JavaSparkContext(args[0], Simple, System.getenv(sparkHome), JavaSparkContext.jarOfClass(Simple.class)); JavaPairRDDImmutableBytesWritable, Result hBaseRDD = ctx.newAPIHadoopRDD( config, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); MapImmutableBytesWritable, Result rddMap = hBaseRDD.collectAsMap(); But when I go to the spark cluster and check the logs, I see following error - INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:, 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:154) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:92) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Thanks, -Vibhor
Re: Problem using Spark with Hbase
Any one who has used spark this way or has faced similar issue, please help. Thanks, -Vibhor On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga vibhorba...@gmail.com wrote: Hi all, I am facing issues while using spark with HBase. I am getting NullPointerException at org.apache.hadoop.hbase.TableName.valueOf (TableName.java:288) Can someone please help to resolve this issue. What am I missing ? I am using following snippet of code - Configuration config = HBaseConfiguration.create(); config.set(hbase.zookeeper.znode.parent, hostname1); config.set(hbase.zookeeper.quorum,hostname1); config.set(hbase.zookeeper.property.clientPort,2181); config.set(hbase.master, hostname1: config.set(fs.defaultFS,hdfs://hostname1/); config.set(dfs.namenode.rpc-address,hostname1:8020); config.set(TableInputFormat.INPUT_TABLE, tableName); JavaSparkContext ctx = new JavaSparkContext(args[0], Simple, System.getenv(sparkHome), JavaSparkContext.jarOfClass(Simple.class)); JavaPairRDDImmutableBytesWritable, Result hBaseRDD = ctx.newAPIHadoopRDD( config, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); MapImmutableBytesWritable, Result rddMap = hBaseRDD.collectAsMap(); But when I go to the spark cluster and check the logs, I see following error - INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:, 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:154) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:92) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Thanks, -Vibhor