from:"Akhilesh Pathodia"

Re: How many Spark streaming applications can be run at a time on a Spark cluster?

2016-12-14 Thread Akhilesh Pathodia

If you have enough cores/resources, run them separately depending on your
use case.

On Thursday 15 December 2016, Divya Gehlot  wrote:

> It depends on the use case ...
> Spark always depends on the resource availability .
> As long as you have resource to acoomodate ,can run as many spark/spark
> streaming  application.
>
>
> Thanks,
> Divya
>
> On 15 December 2016 at 08:42, shyla deshpande  > wrote:
>
>> How many Spark streaming applications can be run at a time on a Spark
>> cluster?
>>
>> Is it better to have 1 spark streaming application to consume all the
>> Kafka topics or have multiple streaming applications when possible to keep
>> it simple?
>>
>> Thanks
>>
>>
>

Re: Reading parquet files into Spark Streaming

2016-08-27 Thread Akhilesh Pathodia

Hi Renato,

Which version of Spark are you using?

If spark version is 1.3.0 or more then you can use SqlContext to read the
parquet file which will give you DataFrame. Please follow the below link:

https://spark.apache.org/docs/1.5.0/sql-programming-guide.html#loading-data-programmatically

Thanks,
Akhilesh

On Sat, Aug 27, 2016 at 3:26 AM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Anybody? I think Rory also didn't get an answer from the list ...
>
> https://mail-archives.apache.org/mod_mbox/spark-user/201602.mbox/%3CCAC+
> fre14pv5nvqhtbvqdc+6dkxo73odazfqslbso8f94ozo...@mail.gmail.com%3E
>
>
>
> 2016-08-26 17:42 GMT+02:00 Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com>:
>
>> Hi all,
>>
>> I am trying to use parquet files as input for DStream operations, but I
>> can't find any documentation or example. The only thing I found was [1] but
>> I also get the same error as in the post (Class
>> parquet.avro.AvroReadSupport not found).
>> Ideally I would like to do have something like this:
>>
>> val oDStream = ssc.fileStream[Void, Order, ParquetInputFormat[Order]]("da
>> ta/")
>>
>> where Order is a case class and the files inside "data" are all parquet
>> files.
>> Any hints would be highly appreciated. Thanks!
>>
>>
>> Best,
>>
>> Renato M.
>>
>> [1] http://stackoverflow.com/questions/35413552/how-do-i-read-
>> in-parquet-files-using-ssc-filestream-and-what-is-the-nature
>>
>
>

Re: Reading parquet files into Spark Streaming

2016-08-26 Thread Akhilesh Pathodia

Hi Renato,

Which version of Spark are you using?

If spark version is 1.3.0 or more then you can use SqlContext to read the
parquet file which will give you DataFrame. Please follow the below link:

https://spark.apache.org/docs/1.5.0/sql-programming-guide.html#loading-data-programmatically

Thanks,
Akhilesh

On Sat, Aug 27, 2016 at 3:26 AM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Anybody? I think Rory also didn't get an answer from the list ...
>
> https://mail-archives.apache.org/mod_mbox/spark-user/201602.mbox/%3CCAC+
> fre14pv5nvqhtbvqdc+6dkxo73odazfqslbso8f94ozo...@mail.gmail.com%3E
>
>
>
> 2016-08-26 17:42 GMT+02:00 Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com>:
>
>> Hi all,
>>
>> I am trying to use parquet files as input for DStream operations, but I
>> can't find any documentation or example. The only thing I found was [1] but
>> I also get the same error as in the post (Class
>> parquet.avro.AvroReadSupport not found).
>> Ideally I would like to do have something like this:
>>
>> val oDStream = ssc.fileStream[Void, Order, ParquetInputFormat[Order]]("da
>> ta/")
>>
>> where Order is a case class and the files inside "data" are all parquet
>> files.
>> Any hints would be highly appreciated. Thanks!
>>
>>
>> Best,
>>
>> Renato M.
>>
>> [1] http://stackoverflow.com/questions/35413552/how-do-i-read-
>> in-parquet-files-using-ssc-filestream-and-what-is-the-nature
>>
>
>

Failed to get broadcast_1_piece0 of broadcast_1

2016-04-04 Thread Akhilesh Pathodia

Hi,

I am running spark jobs on yarn in cluster mode. The job get the messages
from kafka direct stream. I am using broadcast variables and checkpointing
every 30 seconds. When I start the job first time it runs fine without any
issue. If I kill the job and restart it throws below exception in executor
upon receiving a message from kafka:

java.io.IOException: org.apache.spark.SparkException: Failed to get
broadcast_1_piece0 of broadcast_1
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1178)
at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at 
net.juniper.spark.stream.LogDataStreamProcessor$2.call(LogDataStreamProcessor.java:177)
at 
net.juniper.spark.stream.LogDataStreamProcessor$2.call(LogDataStreamProcessor.java:1)
at 
org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$1$1.apply(JavaDStreamLike.scala:172)
at 
org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$fn$1$1.apply(JavaDStreamLike.scala:172)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$28.apply(RDD.scala:1298)
at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$28.apply(RDD.scala:1298)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Does anyone have idea how to resolve this error?

Spark version: 1.5.0
CDH 5.5.1

Thanks,
AKhilesh

Reading large set of files in Spark

2016-02-04 Thread Akhilesh Pathodia

Hi,

I am using Spark to read large set of files from HDFS, applying some
formatting on each line and then saving each line as a record in hive.
Spark is reading directory paths from kafka. Each directory can have large
number of files. I am reading one path from kafka and then processing all
files of the directory in parallel. I have delete the directory after all
files are processed I have following questions:

1. What is the optimized way to read large set of files in Spark? I am not
using sc.textFile(), instead I am reading the file content using FileSystem
and creating Dstream of lines.
2. How to delete the directory/files from HDFS after the task is completed?

Thanks,
Akhilesh

Undefined job output-path error in Spark on hive

2016-01-25 Thread Akhilesh Pathodia

Hi,

I am getting following exception in Spark while writing to hive partitioned
table in parquet format:

16/01/25 03:56:40 ERROR executor.Executor: Exception in task 0.2 in
stage 1.0 (TID 3)
java.io.IOException: Undefined job output-path
at 
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:232)
at 
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.org$apache$spark$sql$hive$SparkHiveDynamicPartitionWriterContainer$$newWriter$1(hiveWriterContainers.scala:237)
at 
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:250)
at 
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:250)
at 
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
at 
org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.getLocalFileWriter(hiveWriterContainers.scala:250)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:112)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:104)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:104)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:85)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:85)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

Spark version:1.5.0

Please let me know if anybody has idea about this error.

Thanks,

Akhilesh

Spark not saving data to Hive

2016-01-23 Thread Akhilesh Pathodia

Hi,

I am trying to write data from spark to Hive partitioned table:

DataFrame dataFrame = sqlContext.createDataFrame(rdd, schema);
dataFrame.write().partitionBy("YEAR","MONTH","DAY").saveAsTable(tableName);

The data is not being written to hive table (hdfs location:
/user/hive/warehouse//), Below are the logs from spark
executor. As shown in the logs, it is writing the data to
/tmp/spark-a3c7ed0f-76c6-4c3c-b80c-0734e33390a2/metastore/case_logs, but I
did not find this directory in HDFS.

16/01/23 02:15:03 INFO datasources.DynamicPartitionWriterContainer:
Sorting complete. Writing out partition files one at a time.
16/01/23 02:15:03 INFO compress.CodecPool: Got brand-new compressor [.gz]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/parquet-pig-bundle-1.5.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/parquet-hadoop-bundle-1.5.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/parquet-format-2.1.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/hive-exec-1.1.0-cdh5.5.1.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/hive-jdbc-1.1.0-cdh5.5.1-standalone.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type
[shaded.parquet.org.slf4j.helpers.NOPLoggerFactory]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:05 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:06 INFO compress.CodecPool: Got brand-new compressor [.gz]
16/01/23 02:15:06 INFO output.FileOutputCommitter: Saved output of
task 'attempt_201601230214_0023_m_00_0' to
file:/tmp/spark-a3c7ed0f-76c6-4c3c-b80c-0734e33390a2/metastore/case_logs
16/01/23 02:15:06 INFO mapred.SparkHadoopMapRedUtil:
attempt_201601230214_0023_m_00_0: Committed
16/01/23 02:15:06 INFO executor.Executor: Finished task 0.0 in stage
23.0 (TID 23). 2013 bytes result sent to driver


I am using CDH 5.5.1 an Spark 1.5.0. Does anybody have idea what is
happening here?

Thanks,
Akhilesh

Spark not writing data in Hive format

2016-01-23 Thread Akhilesh Pathodia

Hi,

I am trying to write data from Spark to hive partitioned table. The job is
running without any error, but it is not writing the data to correct
location.

job-executor-0] parquet.ParquetRelation (Logging.scala:logInfo(59)) -
Listing 
file:/yarn/nm/usercache/root/appcache/application_1453561680059_0005/container_e89_1453561680059_0005_01_01/tmp/spark-f252468d-61f0-44f2-8819-34e2c27c80c7/metastore/case_logs
on driver
2016-01-23 07:58:53,223 INFO  [streaming-job-executor-0]
parquet.ParquetRelation (Logging.scala:logInfo(59)) - Listing
file:/yarn/nm/usercache/root/appcache/application_1453561680059_0005/container_e89_1453561680059_0005_01_01/tmp/spark-f252468d-61f0-44f2-8819-34e2c27c80c7/metastore/case_logs
on driver*2016-01-23 07:58:53,276 WARN  [streaming-job-executor-0]
hive.HiveContext$$anon$1 (Logging.scala:logWarning(71)) - Persisting
partitioned data source relation `CASE_LOGS` into Hive metastore in
Spark SQL specific format, which is NOT compatible with Hive. Input
path(s): *
file:/yarn/nm/usercache/root/appcache/application_1453561680059_0005/container_e89_1453561680059_0005_01_01/tmp/spark-f252468d-61f0-44f2-8819-34e2c27c80c7/metastore/case_logs
2016-01-23 07:58:53,454 INFO  [streaming-job-executor-0]
log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - 654 INFO
[JobScheduler] scheduler.JobScheduler (Logging.scala:logInfo(59)) -
Finished job streaming job 145356471 ms.0 from job set of time
145356471 ms


Its not writing data in Spark SQL format instead of Hive format. Can
anybody tell me how to get rid of this issue?

Spark version - 1.5.0
CDH 5.5.1

Thanks,
Akhilesh Pathodia

Re: Spark on hbase using Phoenix in secure cluster

2015-12-07 Thread Akhilesh Pathodia

Yes, its a kerberized cluster and ticket was generated using kinit command
before running spark job. That's why Spark on hbase worked but when phoenix
is used to get the connection to hbase, it does not pass the authentication
to all nodes. Probably it is not handled in Phoenix version 4.3 or Spark
1.3.1 does not provide integration with Phoenix for kerberized cluster.

Can anybody confirm whether Spark 1.3.1 supports Phoenix on secured cluster
or not?

Thanks,
Akhilesh

On Tue, Dec 8, 2015 at 2:57 AM, Ruslan Dautkhanov <dautkha...@gmail.com>
wrote:

> That error is not directly related to spark nor hbase
>
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]
>
> Is this a kerberized cluster? You likely don't have a good (non-expired)
> kerberos ticket for authentication to pass.
>
>
> --
> Ruslan Dautkhanov
>
> On Mon, Dec 7, 2015 at 12:54 PM, Akhilesh Pathodia <
> pathodia.akhil...@gmail.com> wrote:
>
>> Hi,
>>
>> I am running spark job on yarn in cluster mode in secured cluster. I am
>> trying to run Spark on Hbase using Phoenix, but Spark executors are
>> unable to get hbase connection using phoenix. I am running knit command to
>> get the ticket before starting the job and also keytab file and principal
>> are correctly specified in connection URL. But still spark job on each node
>> throws below error:
>>
>> 15/12/01 03:23:15 ERROR ipc.AbstractRpcClient: SASL authentication
>> failed. The most likely cause is missing or invalid credentials. Consider
>> 'kinit'.
>> javax.security.sasl.SaslException: GSS initiate failed [Caused by
>> GSSException: No valid credentials provided (Mechanism level: Failed to
>> find any Kerberos tgt)]
>> at
>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>>
>> I am using Spark 1.3.1, Hbase 1.0.0, Phoenix 4.3. I am able to run Spark
>> on Hbase(without phoenix) successfully in yarn-client mode as mentioned in
>> this link:
>>
>> https://github.com/cloudera-labs/SparkOnHBase#scan-that-works-on-kerberos
>>
>> Also, I found that there is a known issue for yarn-cluster mode for Spark
>> 1.3.1 version:
>>
>> https://issues.apache.org/jira/browse/SPARK-6918
>>
>> Has anybody been successful in running Spark on hbase using Phoenix in
>> yarn cluster or client mode?
>>
>> Thanks,
>> Akhilesh Pathodia
>>
>
>

Spark on hbase using Phoenix in secure cluster

2015-12-07 Thread Akhilesh Pathodia

Hi,

I am running spark job on yarn in cluster mode in secured cluster. I am
trying to run Spark on Hbase using Phoenix, but Spark executors are unable
to get hbase connection using phoenix. I am running knit command to get the
ticket before starting the job and also keytab file and principal are
correctly specified in connection URL. But still spark job on each node
throws below error:

15/12/01 03:23:15 ERROR ipc.AbstractRpcClient: SASL authentication failed.
The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]
at
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)

I am using Spark 1.3.1, Hbase 1.0.0, Phoenix 4.3. I am able to run Spark on
Hbase(without phoenix) successfully in yarn-client mode as mentioned in
this link:

https://github.com/cloudera-labs/SparkOnHBase#scan-that-works-on-kerberos

Also, I found that there is a known issue for yarn-cluster mode for Spark
1.3.1 version:

https://issues.apache.org/jira/browse/SPARK-6918

Has anybody been successful in running Spark on hbase using Phoenix in yarn
cluster or client mode?

Thanks,
Akhilesh Pathodia

Unable to get phoenix connection in spark job in secured cluster

2015-12-01 Thread Akhilesh Pathodia

)
at
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162)
at
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:126)
at
org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
at
org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38)
at
org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
at
org.apache.commons.dbcp.BasicDataSource.validateConnectionFactory(BasicDataSource.java:1556)



I am able to connect to hbase using phoenix through a standalone
application in secured server which means all the configurations required
for getting the connection are correct. Do we need any addition
configuration for making phoenix work in spark job running on yarn in
cluster mode in secured cluster?

How do make phoenix work with spark in secured environment?

Thanks,
Akhilesh Pathodia

Re: Unable to get phoenix connection in spark job in secured cluster

2015-12-01 Thread Akhilesh Pathodia

Spark - 1.3.1
Hbase - 1.0.0
Phoenix - 4.3
Cloudera - 5.4

On Tue, Dec 1, 2015 at 9:35 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> What are the versions for Spark / HBase / Phoenix you're using ?
>
> Cheers
>
> On Tue, Dec 1, 2015 at 4:15 AM, Akhilesh Pathodia <
> pathodia.akhil...@gmail.com> wrote:
>
>> Hi,
>>
>> I am running spark job on yarn in cluster mode in secured cluster. Spark
>> executors are unable to get hbase connection using phoenix. I am running
>> knit command to get the ticket before starting the job and also keytab file
>> and principal are correctly specified in connection URL. But still spark
>> job on each node throws below error:
>>
>> 15/12/01 03:23:15 ERROR ipc.AbstractRpcClient: SASL authentication
>> failed. The most likely cause is missing or invalid credentials. Consider
>> 'kinit'.
>> javax.security.sasl.SaslException: GSS initiate failed [Caused by
>> GSSException: No valid credentials provided (Mechanism level: Failed to
>> find any Kerberos tgt)]
>> at
>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>> at
>> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:605)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:154)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:731)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:728)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:728)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:881)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:850)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1174)
>> at
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
>> at
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:46470)
>> at
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1606)
>> at
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1544)
>> at
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1566)
>> at
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1595)
>> at
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1801)
>> at
>> org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
>> at
>> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3678)
>> at
>> org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:451)
>> at
>> org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:473)
>> at
>> org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:804)
>> at
>> org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1194)
>> at
>> org.apache.phoenix.query.DelegateConnectionQueryServices.createTable(DelegateConnectionQueryServices.java:111)
>> at
>> org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:1683)
>> at
>> org.apache.phoenix.schema.MetaDataClient.createTable(MetaData

Re: How many Spark streaming applications can be run at a time on a Spark cluster?

Re: Reading parquet files into Spark Streaming

Re: Reading parquet files into Spark Streaming

Failed to get broadcast_1_piece0 of broadcast_1

Reading large set of files in Spark

Undefined job output-path error in Spark on hive

Spark not saving data to Hive

Spark not writing data in Hive format

Re: Spark on hbase using Phoenix in secure cluster

Spark on hbase using Phoenix in secure cluster

Unable to get phoenix connection in spark job in secured cluster

Re: Unable to get phoenix connection in spark job in secured cluster

12 matches

Site Navigation

Mail list logo

Footer information