Re: AWS credentials needed while trying to read a model from S3 in Spark

2018-05-09 Thread Srinath C
You could use IAM roles in AWS to access the data in S3 without credentials. See this link and this link for an

AWS credentials needed while trying to read a model from S3 in Spark

2018-05-09 Thread Mina Aslani
Hi, I am trying to load a ML model from AWS S3 in my spark app running in a docker container, however I need to pass the AWS credentials. My questions is, why do I need to pass the credentials in the path? And what is the workaround? Best regards, Mina

Making spark streaming application single threaded

2018-05-09 Thread ravidspark
Hi All, Is there any property which makes my spark streaming application a single threaded? I researched on this property, *spark.dynamicAllocation.maxExecutors=1*, but as far as I understand this launches a maximum of one container but not a single thread. In local mode, we can configure the

[Structured-Streaming][Beginner] Out of order messages with Spark kafka readstream from a specific partition

2018-05-09 Thread karthikjay
On the producer side, I make sure data for a specific user lands on the same partition. On the consumer side, I use a regular Spark kafka readstream and read the data. I also use a console write stream to print out the spark kafka DataFrame. What I observer is, the data for a specific user (even

Problem with Spark Master shutting down when zookeeper leader is shutdown

2018-05-09 Thread agateaaa
Dear Spark community, Just wanted to bring this issue up which was filed for Spark 1.6.1 ( https://issues.apache.org/jira/browse/SPARK-15544) but also exists in Spark 2.3.0 (https://issues.apache.org/jira/browse/SPARK-23530) We have run into this on production, where Spark Master shuts down if

Livy Failed error on Yarn with Spark

2018-05-09 Thread Chetan Khatri
All, I am running on Hortonworks HDP Hadoop with Livy and Spark 2.2.0, when I am running same spark job using spark-submit it is getting success with all transformations are done. When I am trying to do spark submit using Livy, at that time Spark Job is getting invoked and getting success but

Re: Spark UI Source Code

2018-05-09 Thread Marcelo Vanzin
(-dev) The KVStore API is private to Spark, it's not really meant to be used by others. You're free to try, and there's a lot of javadocs on the different interfaces, but it's not a general purpose database, so you'll need to figure out things like that by yourself. On Tue, May 8, 2018 at 9:53

Fwd: Array[Double] two time slower then DenseVector

2018-05-09 Thread David Ignjić
Hello all, I am currently looking in 1 spark application to squeze little performance and here this code (attached in email) I looked in difference and in: org.apache.spark.sql.catalyst.CatalystTypeConverters.ArrayConverter if its primitive we still use boxing and unboxing version because in code

Invalid Spark URL: spark://HeartbeatReceiver@hostname

2018-05-09 Thread Serkan TAS
While trying to execute python script with pycharm on Windows version am getting this error. Anyone has and ideaabout the error ? Spark version : 2.3.0 py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. :

Malformed URL Exception when connecting to Phoenix to Spark

2018-05-09 Thread Alchemist
CodeJavaSparkContext sc = new JavaSparkContext(sparkConf); SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc); Map map = new HashMap(); map.put("zkUrl", args[2]); map.put("table", args[1]); map.put("driver",

Spark 2.3.0 --files vs. addFile()

2018-05-09 Thread Marius
Hey, i am using Spark to distribute the execution of a binary tool and to do some further calculation further down stream. I want to distribute the binary tool using either the --files or the addFile option from spark to make it available on each worker node. However although he tells my that

Spark 2.3.0 Structured Streaming Kafka Timestamp

2018-05-09 Thread Yuta Morisawa
Hi All I'm trying to extract Kafka-timestamp from Kafka topics. The timestamp does not contain milli-seconds information, but it should contain because ConsumerRecord class of Kafka 0.10 supports milli-second timestamp. How can I get milli-second timestamp from Kafka topics? These are