Have you tried adding hbase client jars to spark.executor.extraClassPath ? Cheers
On Wed, Feb 10, 2016 at 12:17 AM, Prabhu Joseph <prabhujose.ga...@gmail.com> wrote: > + Spark-Dev > > For a Spark job on YARN accessing hbase table, added all hbase client jars > into spark.yarn.dist.files, NodeManager when launching container i.e > executor, does localization and brings all hbase-client jars into executor > CWD, but still the executor tasks fail with ClassNotFoundException of hbase > client jars, when i checked launch container.sh , Classpath does not have > $PWD/* and hence all the hbase client jars are ignored. > > Is spark.yarn.dist.files not for adding jars into the executor classpath. > > Thanks, > Prabhu Joseph > > On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph <prabhujose.ga...@gmail.com> > wrote: > >> Hi All, >> >> When i do count on a Hbase table from Spark Shell which runs as >> yarn-client mode, the job fails at count(). >> >> MASTER=yarn-client ./spark-shell >> >> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor, >> TableName} >> import org.apache.hadoop.hbase.client.HBaseAdmin >> import org.apache.hadoop.hbase.mapreduce.TableInputFormat >> >> val conf = HBaseConfiguration.create() >> conf.set(TableInputFormat.INPUT_TABLE,"spark") >> >> val hBaseRDD = sc.newAPIHadoopRDD(conf, >> classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result]) >> hBaseRDD.count() >> >> >> Tasks throw below exception, the actual exception is swallowed, a bug >> JDK-7172206. After installing hbase client on all NodeManager machines, the >> Spark job ran fine. So I confirmed that the issue is with executor >> classpath. >> >> But i am searching for some other way of including hbase jars in spark >> executor classpath instead of installing hbase client on all NM machines. >> Tried adding all hbase jars in spark.yarn.dist.files , NM logs shows that >> it localized all hbase jars, still the job fails. Tried >> spark.executor.extraClasspath, still the job fails. >> >> Is there any way we can access hbase from Executor without installing >> hbase-client on all machines. >> >> >> 16/02/09 02:34:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, >> prabhuFS1): *java.lang.IllegalStateException: unread block data* >> at >> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2428) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) >> at >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> Thanks, >> Prabhu Joseph >> > >