Have you tried adding hbase client jars to spark.executor.extraClassPath ?

Cheers

On Wed, Feb 10, 2016 at 12:17 AM, Prabhu Joseph <prabhujose.ga...@gmail.com>
wrote:

> + Spark-Dev
>
> For a Spark job on YARN accessing hbase table, added all hbase client jars
> into spark.yarn.dist.files, NodeManager when launching container i.e
> executor, does localization and brings all hbase-client jars into executor
> CWD, but still the executor tasks fail with ClassNotFoundException of hbase
> client jars, when i checked launch container.sh , Classpath does not have
> $PWD/* and hence all the hbase client jars are ignored.
>
> Is spark.yarn.dist.files not for adding jars into the executor classpath.
>
> Thanks,
> Prabhu Joseph
>
> On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph <prabhujose.ga...@gmail.com>
> wrote:
>
>> Hi All,
>>
>>  When i do count on a Hbase table from Spark Shell which runs as
>> yarn-client mode, the job fails at count().
>>
>> MASTER=yarn-client ./spark-shell
>>
>> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor,
>> TableName}
>> import org.apache.hadoop.hbase.client.HBaseAdmin
>> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
>>
>> val conf = HBaseConfiguration.create()
>> conf.set(TableInputFormat.INPUT_TABLE,"spark")
>>
>> val hBaseRDD = sc.newAPIHadoopRDD(conf,
>> classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result])
>> hBaseRDD.count()
>>
>>
>> Tasks throw below exception, the actual exception is swallowed, a bug
>> JDK-7172206. After installing hbase client on all NodeManager machines, the
>> Spark job ran fine. So I confirmed that the issue is with executor
>> classpath.
>>
>> But i am searching for some other way of including hbase jars in spark
>> executor classpath instead of installing hbase client on all NM machines.
>> Tried adding all hbase jars in spark.yarn.dist.files , NM logs shows that
>> it localized all hbase jars, still the job fails. Tried
>> spark.executor.extraClasspath, still the job fails.
>>
>> Is there any way we can access hbase from Executor without installing
>> hbase-client on all machines.
>>
>>
>> 16/02/09 02:34:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
>> prabhuFS1): *java.lang.IllegalStateException: unread block data*
>>         at
>> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2428)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>>         at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>         at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>         at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
>>         at
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> Thanks,
>> Prabhu Joseph
>>
>
>

Reply via email to