Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Takeshi Yamamuro
Hi, This behaviour seems to be expected because you must ensure `b + zero() = b` The your case `b + null = null` breaks this rule. This is the same with v1.6.1. See: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala#L57 //

add multiple columns

2016-06-26 Thread pseudo oduesp
Hi who i can add multiple columns to data frame withcolumns allow to add one columns but when you have multiple i have to loop on eache columns ? thanks

Re: add multiple columns

2016-06-26 Thread ndjido
Hi guy! I'm afraid you have to loop...The update of the Logical Plan is getting faster on Spark. Cheers, Ardo. Sent from my iPhone > On 26 Jun 2016, at 14:20, pseudo oduesp wrote: > > Hi who i can add multiple columns to data frame > > withcolumns allow to add one

alter table with hive context

2016-06-26 Thread pseudo oduesp
Hi, how i can alter table by adiing new columns to table in hivecontext ?

Re: add multiple columns

2016-06-26 Thread ayan guha
Can you share an example? You may want to write a sql stmt to add the columns>? On Sun, Jun 26, 2016 at 11:02 PM, wrote: > Hi guy! > > I'm afraid you have to loop...The update of the Logical Plan is getting > faster on Spark. > > Cheers, > Ardo. > > Sent from my iPhone > > >

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Takeshi Yamamuro
No, TypedAggregateExpression that uses Aggregator#zero is different between v2.0 and v1.6. v2.0: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala#L91 v1.6:

Running of Continuous Aggregation example

2016-06-26 Thread Chang Lim
Has anyone been able to run the code in The Future of Real-Time in Spark Slide 24 :"Continuous Aggregation"? Specifically, the line: stream("jdbc:mysql//..."), Using Spark 2.0 preview build, I am getting the error when

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Amit Sela
Not sure about what's the rule in case of `b + null = null` but the same code works perfectly in 1.6.1, just tried it.. On Sun, Jun 26, 2016 at 1:24 PM Takeshi Yamamuro wrote: > Hi, > > This behaviour seems to be expected because you must ensure `b + zero() = > b` > The

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Takeshi Yamamuro
Whatever it is, this is expected; if an initial value is null, spark codegen removes all the aggregates. See: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L199 // maropu On Sun, Jun 26, 2016 at 7:46 PM, Amit Sela

Re: Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Amit Sela
This "if (value == null)" condition you point to exists in 1.6 branch as well, so that's probably not the reason. On Sun, Jun 26, 2016 at 1:53 PM Takeshi Yamamuro wrote: > Whatever it is, this is expected; if an initial value is null, spark > codegen removes all the

RE: Logging trait in Spark 2.0

2016-06-26 Thread Paolo Patierno
Yes ... the same here ... I'd like to know the best way for adding logging in a custom receiver for Spark Streaming 2.0 Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor Twitter : @ppatierno Linkedin : paolopatierno Blog :

Aggregator (Spark 2.0) skips aggregation is zero(0 returns null

2016-06-26 Thread Amit Sela
Sometimes, the BUF for the aggregator may depend on the actual input.. and while this passes the responsibility to handle null in merge/reduce to the developer, it sounds fine to me if he is the one who put null in zero() anyway. Now, it seems that the aggregation is skipped entirely when zero() =

Re: Spark Task is not created

2016-06-26 Thread Ravindra
I have a lot of spark tests. And the failure is not deterministic. It can happen at any action that I do. Buy given below logs are common. And I overcome that using the repartitioning, coalescing etc so that I don't get that Submitting 2 missing tasks from ShuffleMapStage. Basically ensuring that

Re: alter table with hive context

2016-06-26 Thread Mich Talebzadeh
-- create the hivecontext scala> *val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)*HiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@6387fb09 --use the test dastabase scala> *HiveContext.sql("use test")*res8: org.apache.spark.sql.DataFrame

Running JavaBased Implementation of StreamingKmeans Spark

2016-06-26 Thread Biplob Biswas
Hi, Something is wrong with my spark subscription so I can't see the responses properly on nabble, so I subscribed from a different id, hopefully it is solved and I am putting my question again here. I implemented the streamingKmeans example provided in the spark website but in Java. The full

What is the explanation of "ConvertToUnsafe" in "Physical Plan"

2016-06-26 Thread Mich Talebzadeh
Hi, In Spark's Physical Plan what is the explanation for ConvertToUnsafe? Example: scala> sorted.filter($"prod_id" ===13).explain == Physical Plan == Filter (prod_id#10L = 13) +- Sort [prod_id#10L ASC,cust_id#11L ASC,time_id#12 ASC,channel_id#13L ASC,promo_id#14L ASC], true, 0 +-

Re: Spark 1.6.1: Unexpected partition behavior?

2016-06-26 Thread Randy Gelhausen
Sorry, please ignore the above. I now see I called coalesce on a different reference, than I used to register the table. On Sun, Jun 26, 2016 at 6:34 PM, Randy Gelhausen wrote: > > val enriched_web_logs = sqlContext.sql(""" > select web_logs.datetime, web_logs.node as

Spark 1.6.1: Unexpected partition behavior?

2016-06-26 Thread Randy Gelhausen
val enriched_web_logs = sqlContext.sql(""" select web_logs.datetime, web_logs.node as app_host, source_ip, b.node as source_host, log from web_logs left outer join (select distinct node, address from nodes) b on source_ip = address """)

Re: problem running spark with yarn-client not using spark-submit

2016-06-26 Thread sychungd
Hi, Thanks for reply. It's a java web service resides in a jboss container. HY Chung Best regards, S.Y. Chung 鍾學毅 F14MITD Taiwan Semiconductor Manufacturing Company, Ltd. Tel: 06-5056688 Ext: 734-6325 |-> |Mich Talebzadeh | |

Re: problem running spark with yarn-client not using spark-submit

2016-06-26 Thread Saisai Shao
It means several jars are missing in the yarn container environment, if you want to submit your application through some other ways besides spark-submit, you have to take care all the environment things yourself. Since we don't know your implementation of java web service, so it is hard to provide

How to convert a Random Forest model built in R to a similar model in Spark

2016-06-26 Thread Neha Mehta
Hi All, Request help with problem mentioned in the mail below. I have an existing random forest model in R which needs to be deployed on Spark. I am trying to recreate the model in Spark but facing the problem mentioned below. Thanks, Neha On Jun 24, 2016 5:10 PM, wrote: > > Hi Sun, > > I am

Unsubscribe

2016-06-26 Thread kchen
Unsubscribe

Difference between Dataframe and RDD Persisting

2016-06-26 Thread Brandon White
What is the difference between persisting a dataframe and a rdd? When I persist my RDD, the UI says it takes 50G or more of memory. When I persist my dataframe, the UI says it takes 9G or less of memory. Does the dataframe not persist the actual content? Is it better / faster to persist a RDD

Re: Spark 2.0 Streaming and Event Time

2016-06-26 Thread Chang Lim
Here is an update to my question: = Tathagata Das Jun 9 to me Event time is part of windowed aggregation API. See my slides - https://www.slideshare.net/mobile/databricks/a-deep-dive-into-structured-streaming Let me know if it helps you to find it.