+ Spark-Dev For a Spark job on YARN accessing hbase table, added all hbase client jars into spark.yarn.dist.files, NodeManager when launching container i.e executor, does localization and brings all hbase-client jars into executor CWD, but still the executor tasks fail with ClassNotFoundException of hbase client jars, when i checked launch container.sh , Classpath does not have $PWD/* and hence all the hbase client jars are ignored.
Is spark.yarn.dist.files not for adding jars into the executor classpath. Thanks, Prabhu Joseph On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph <prabhujose.ga...@gmail.com> wrote: > Hi All, > > When i do count on a Hbase table from Spark Shell which runs as > yarn-client mode, the job fails at count(). > > MASTER=yarn-client ./spark-shell > > import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor, > TableName} > import org.apache.hadoop.hbase.client.HBaseAdmin > import org.apache.hadoop.hbase.mapreduce.TableInputFormat > > val conf = HBaseConfiguration.create() > conf.set(TableInputFormat.INPUT_TABLE,"spark") > > val hBaseRDD = sc.newAPIHadoopRDD(conf, > classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result]) > hBaseRDD.count() > > > Tasks throw below exception, the actual exception is swallowed, a bug > JDK-7172206. After installing hbase client on all NodeManager machines, the > Spark job ran fine. So I confirmed that the issue is with executor > classpath. > > But i am searching for some other way of including hbase jars in spark > executor classpath instead of installing hbase client on all NM machines. > Tried adding all hbase jars in spark.yarn.dist.files , NM logs shows that > it localized all hbase jars, still the job fails. Tried > spark.executor.extraClasspath, still the job fails. > > Is there any way we can access hbase from Executor without installing > hbase-client on all machines. > > > 16/02/09 02:34:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, > prabhuFS1): *java.lang.IllegalStateException: unread block data* > at > java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2428) > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > > Thanks, > Prabhu Joseph >