Spark 2.2 With Column usage

2019-06-07 Thread anbutech
Hi Sir, Could you please advise to fix the below issue in the withColumn in the spark 2.2 scala 2.11 joins def processing(spark:SparkSession, dataset1:Dataset[Reference], dataset2:Dataset[DataCore], dataset3:Dataset[ThirdPartyData] , dataset4:Dataset[OtherData]

Spark SQL in R?

2019-06-07 Thread ya
Dear Felix and Richikesh and list, Thank you very much for your previous help. So far I have tried two ways to trigger Spark SQL: one is to use R with sparklyr library and SparkR library; the other way is to use SparkR shell from Spark. I am not connecting a remote spark cluster, but a local

Re: Kafka Topic to Parquet HDFS with Structured Streaming

2019-06-07 Thread Chetan Khatri
Also anyone has any idea to resolve this issue - https://stackoverflow.com/questions/56390492/spark-metadata-0-doesnt-exist-while-compacting-batch-9-structured-streaming-er On Fri, Jun 7, 2019 at 5:59 PM Chetan Khatri wrote: > Hello Dear Spark Users, > > I am trying to write data from Kafka

Kafka Topic to Parquet HDFS with Structured Streaming

2019-06-07 Thread Chetan Khatri
Hello Dear Spark Users, I am trying to write data from Kafka Topic to Parquet HDFS with Structured Streaming but Getting failures. Please do help. val spark: SparkSession = SparkSession.builder().appName("DemoSparkKafka").getOrCreate() import spark.implicits._ val dataFromTopicDF = spark

Re: sparksql in sparkR?

2019-06-07 Thread Felix Cheung
This seem to be more a question of spark-sql shell? I may suggest you change the email title to get more attention. From: ya Sent: Wednesday, June 5, 2019 11:48:17 PM To: user@spark.apache.org Subject: sparksql in sparkR? Dear list, I am trying to use sparksql

[SQL] Why casting string column to timestamp gives null?

2019-06-07 Thread Jacek Laskowski
Hi, Why is casting a string column to timestamp not giving the same results as going through casting to long in-between? I'm tempted to consider it a bug. scala> spark.version res4: String = 2.4.3 scala> Seq("1", "2").toDF("ts").select($"ts" cast "timestamp").show ++ | ts| ++ |null|

Getting driver logs in Standalone Cluster

2019-06-07 Thread tkrol
Hey Guys, I am wondering what is the best way to get logs for driver in the cluster mode on standalone cluster? Normally I used to run client mode so I could capture logs from the console. Now I've started running jobs in cluster mode and obviously driver is running on worker and can't see the

Spark logging questions

2019-06-07 Thread test test
Hello, How can we dump the spark driver and executor threads information in spark application logging.? PS: submitting spark job using spark submit Regards Rohit

Re: adding a column to a groupBy (dataframe)

2019-06-07 Thread Marcelo Valle
Hi Bruno, that's really interesting... So, to use explode, I would have to do a group by on countries and a collect_all on cities, then explode the cities, right? Am I understanding the idea right? I think this could produce the results I want. But what would be the behaviour under the hood?