Re: spark kafka consumer with kerberos

2017-03-31 Thread Saisai Shao
Hi Bill, Normally Kerberos principal and keytab should be enough, because keytab could actually represent the password. Did you configure SASL/GSSAPI or SASL/Plain for KafkaClient? http://kafka.apache.org/documentation.html#security_sasl Actually this is more like a Kafka question and normally

Partitioning in spark while reading from RDBMS via JDBC

2017-03-31 Thread Devender Yadav
Hi All, I am running spark in cluster mode and reading data from RDBMS via JDBC. As per spark docs, these partitioning parameters describe how to partition the table when reading in parallel from multiple

Re: Looking at EMR Logs

2017-03-31 Thread Neil Jonkers
Modifying spark.eventLog.dir to point to a S3 path, you will encounter the following exception in Spark history log on path: /var/log/spark/spark-history-server.out Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found

Re: Looking at EMR Logs

2017-03-31 Thread Vadim Semenov
You can provide your own log directory, where Spark log will be saved, and that you could replay afterwards. Set in your job this: `spark.eventLog.dir=s3://bucket/some/directory` and run it. Note! The path `s3://bucket/some/directory` must exist before you run your job, it'll not be created

Re: spark kafka consumer with kerberos

2017-03-31 Thread Bill Schwanitz
Saisai, Yea that seems to have helped. Looks like the kerberos ticket when I submit does not get passed to the executor? ... 3 more Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Unable to obtain password from user at

Re: spark kafka consumer with kerberos

2017-03-31 Thread Saisai Shao
Hi Bill, The exception is from executor side. From the gist you provided, looks like the issue is that you only configured java options in driver side, I think you should also configure this in executor side. You could refer to here (

Research paper used in GraphX

2017-03-31 Thread Md. Rezaul Karim
Hi All, Could anyone please tell me which research paper(s) was/were used to implement the metrics like strongly connected components, page rank, triangle count, closeness centrality, clustering coefficient etc. in Spark GrpahX? Regards, _ *Md. Rezaul Karim*,

Re: How to PushDown ParquetFilter Spark 2.0.1 dataframe

2017-03-31 Thread Hanumath Rao Maduri
Hello Rahul, Please try to use df.filter(df("id").isin(1,2)) Thanks, On Thu, Mar 30, 2017 at 10:45 PM, Rahul Nandi wrote: > Hi, > I have around 2 million data as parquet file in s3. The file structure is > somewhat like > id data > 1 abc > 2 cdf > 3 fas > Now I want

Predicate not getting pusdhown to PrunedFilterScan

2017-03-31 Thread Hanumath Rao Maduri
Hello All, I am working on creating a new PrunedFilteredScan operator which has the ability to execute the predicates pushed to this operator. However What I observed is that if column with deep in the hierarchy is used then it is not getting pushed down. SELECT tom._id, tom.address.city from

Re: dataframe filter, unable to bind variable

2017-03-31 Thread shyla deshpande
Works. Thanks Hosur. On Thu, Mar 30, 2017 at 8:37 PM, hosur narahari wrote: > Try lit(fromDate) and lit(toDate). You've to import > org.apache.spark.sql.functions.lit > to use it > > On 31 Mar 2017 7:45 a.m., "shyla deshpande" > wrote: > > The