Re: Size exceeds Integer.MAX_VALUE issue with RandomForest

2017-09-18 Thread Pulluru Ranjith
Hi, Here are the commands that are used. - > spark.default.parallelism=1000 > sparkR.session() Java ref type org.apache.spark.sql.SparkSession id 1 > sql("use test") SparkDataFrame[] > mydata <-sql("select c1 ,p1 ,rt1 ,c2 ,p2 ,rt2 ,avt,avn from test_temp2 where vdr = 'TEST31X' ") > >

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-18 Thread Cody Koeninger
Have you searched in jira, e.g. https://issues.apache.org/jira/browse/SPARK-19185 On Mon, Sep 18, 2017 at 1:56 AM, HARSH TAKKAR wrote: > Hi > > Changing spark version if my last resort, is there any other workaround for > this problem. > > > On Mon, Sep 18, 2017 at 11:43

Builder Pattern used by Spark source code architecture

2017-09-18 Thread Patrick
Hi, A lot of code base of Spark is based on Builder Pattern, so i was wondering what are the benefits that Builder Pattern brings to spark. Some of things that comes in my mind, it is easy on garbage collection and also user friendly API's. Are their any other advantages with code running on

Re: Configuration for unit testing and sql.shuffle.partitions

2017-09-18 Thread Vadim Semenov
you can create a Super class "FunSuiteWithSparkContext" that's going to create a Spark sessions, Spark context, and SQLContext with all the desired properties. Then you add the class to all the relevant test suites, and that's pretty much it. The other option can be is to pass it as a VM

Re: Chaining Spark Streaming Jobs

2017-09-18 Thread Michael Armbrust
You specify the schema when loading a dataframe by calling spark.read.schema(...)... On Tue, Sep 12, 2017 at 4:50 PM, Sunita Arvind wrote: > Hi Michael, > > I am wondering what I am doing wrong. I get error like: > > Exception in thread "main"

[Timer-0:WARN] Logging$class: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

2017-09-18 Thread Jean Georges Perrin
Hi, I am trying to connect to a new cluster I just set up. And I get... [Timer-0:WARN] Logging$class: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources I must have forgotten something really super obvious. My

Re: [Timer-0:WARN] Logging$class: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

2017-09-18 Thread Riccardo Ferrari
Hi Jean, What does the master UI say? http://10.0.100.81:8080 Do you have enough resources availalbe, or is there any running context that is depleting all your resources ? Are your workers registered and alive ? How much memory each? How many cores each ? Best On Mon, Sep 18, 2017 at 11:24

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-18 Thread kant kodali
You should paste some code. ConcurrentModificationException normally happens when you modify a list or any non-thread safe data structure while you are iterating over it. On Sun, Sep 17, 2017 at 10:25 PM, HARSH TAKKAR wrote: > Hi, > > No we are not creating any thread for

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-18 Thread Anastasios Zouzias
Hi, I had a similar issue using 2.1.0 but not with Kafka. Updating to 2.1.1 solved my issue. Can you try with 2.1.1 as well and report back? Best, Anastasios Am 17.09.2017 16:48 schrieb "HARSH TAKKAR" : Hi I am using spark 2.1.0 with scala 2.11.8, and while iterating

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-18 Thread pandees waran
All, May I know what exactly changed in 2.1.1 which solved this problem? Sent from my iPhone > On Sep 17, 2017, at 11:08 PM, Anastasios Zouzias wrote: > > Hi, > > I had a similar issue using 2.1.0 but not with Kafka. Updating to 2.1.1 > solved my issue. Can you try with

Re: [SPARK-SQL] Does spark-sql have Authorization built in?

2017-09-18 Thread Arun Khetarpal
Ping. I did some digging around in the code base - I see that this is not present currently. Just looking for an acknowledgement Regards, Arun > On 15-Sep-2017, at 8:43 PM, Arun Khetarpal wrote: > > Hi - > > Wanted to understand if spark sql has GRANT and

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-18 Thread HARSH TAKKAR
Hi Changing spark version if my last resort, is there any other workaround for this problem. On Mon, Sep 18, 2017 at 11:43 AM pandees waran wrote: > All, May I know what exactly changed in 2.1.1 which solved this problem? > > Sent from my iPhone > > On Sep 17, 2017, at

Question on partitionColumn for a JDBC read using a timestamp from MySql

2017-09-18 Thread lucas.g...@gmail.com
I'm pretty sure you can use a timestamp as a partitionColumn, It's Timestamp type in MySQL. It's at base a numeric type and Spark requires a numeric type passed in. This doesn't work as the where parameter in MySQL becomes raw numerics which won't query against the mysql Timestamp.