RE: Which kafka client to use with spark streaming

2017-12-26 Thread Serkan TAS
Kafka Clients are blocking spark streaming jobs and after a time streaming job queue increases. -Original Message- From: Cody Koeninger [mailto:c...@koeninger.org] Sent: Tuesday, December 26, 2017 6:47 PM To: Diogo Munaro Vieira Cc: Serkan TAS

Standalone Cluster: ClassNotFound org.apache.kafka.common.serialization.ByteArrayDeserializer

2017-12-26 Thread Geoff Von Allmen
I am trying to deploy a standalone cluster but running into ClassNotFound errors. I have tried a whole myriad of different approaches varying from packaging all dependencies into a single JAR and using the --packages and --driver-class-path options. I’ve got a master node started, a slave node

Re: Apache Spark - Structured Streaming graceful shutdown

2017-12-26 Thread M Singh
Thanks Diogo.  My question is how to gracefully call the stop method while the streaming application is running in a cluster. On Monday, December 25, 2017 5:39 PM, Diogo Munaro Vieira wrote: Hi M Singh! Here I'm using query.stop() Em 25 de dez de 2017

Re: NASA CDF files in Spark

2017-12-26 Thread Renato Marroquín Mogrovejo
There is also this project https://github.com/SciSpark/SciSpark It might be of interest to you Christopher. 2017-12-16 3:46 GMT-05:00 Jörn Franke : > Develop your own HadoopFileFormat and use https://spark.apache.org/ > docs/2.0.2/api/java/org/apache/spark/SparkContext. >

Re: Passing an array of more than 22 elements in a UDF

2017-12-26 Thread Vadim Semenov
Functions are still limited to 22 arguments https://github.com/scala/scala/blob/2.13.x/src/library/scala/Function22.scala On Tue, Dec 26, 2017 at 2:19 PM, Felix Cheung wrote: > Generally the 22 limitation is from Scala 2.10. > > In Scala 2.11, the issue with case

Problem in Spark-Kafka Connector

2017-12-26 Thread Sitakant Mishra
Hi, I am trying to connect my Spark cluster to a single Kafka Topic which running as a separate process in a machine. While submitting the spark application, I am getting the following error. *17/12/25 16:56:57 ERROR TransportRequestHandler: Error sending result

Re: Passing an array of more than 22 elements in a UDF

2017-12-26 Thread Felix Cheung
Generally the 22 limitation is from Scala 2.10. In Scala 2.11, the issue with case class is fixed, but with that said I’m not sure if with UDF in Java other limitation might apply. _ From: Aakash Basu Sent: Monday, December 25, 2017 9:13

Re: Spark 2.2.1 worker invocation

2017-12-26 Thread Felix Cheung
I think you are looking for spark.executor.extraJavaOptions https://spark.apache.org/docs/latest/configuration.html#runtime-environment From: Christopher Piggott Sent: Tuesday, December 26, 2017 8:00:56 AM To: user@spark.apache.org Subject:

Spark 2.2.1 worker invocation

2017-12-26 Thread Christopher Piggott
I need to set java.library.path to get access to some native code. Following directions, I made a spark-env.sh: #!/usr/bin/env bash export LD_LIBRARY_PATH="/usr/local/lib/libcdfNativeLibrary.so:/usr/local/lib/libcdf.so:${LD_LIBRARY_PATH}" export

Re: Which kafka client to use with spark streaming

2017-12-26 Thread Cody Koeninger
Do not add a dependency on kafka-clients, the spark-streaming-kafka library has appropriate transitive dependencies. Either version of the spark-streaming-kafka library should work with 1.0 brokers; what problems were you having? On Mon, Dec 25, 2017 at 7:58 PM, Diogo Munaro Vieira

[Structured Streaming] Reuse computation result

2017-12-26 Thread Shu Li Zheng
Hi all, I have a scenario like this: val df = dataframe.map().filter() // agg 1 val query1 = df.sum.writeStream.start // agg 2 val query2 = df.count.writeStream.start With spark streaming, we can apply persist() on rdd to reuse the df computation result, when we call persist() after filter()