Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-26 Thread ayan guha
Thanks all, I have found correct version of the package. Probably HDP documentation is little behind. Best Ayan On Mon, 26 Jun 2017 at 2:16 pm, Mahesh Sawaiker < mahesh_sawai...@persistent.com> wrote: > Ayan, > > The location of the logging class was moved from Spark 1.6 to Spark 2.0. > > Looks

Re: HDP 2.5 - Python - Spark-On-Hbase

2017-06-26 Thread Weiqing Yang
For SHC documentation, please refer the README in SHC github, which is kept up-to-date. On Mon, Jun 26, 2017 at 5:46 AM, ayan guha wrote: > Thanks all, I have found correct version of the package. Probably HDP > documentation is little behind. > > Best > Ayan > > On Mon, 26

ZeroMQ Streaming in Spark2.x

2017-06-26 Thread Aashish Chaudhary
Hi there, I am a beginner when it comes to Spark streaming. I was looking for some examples related to ZeroMQ and Spark and realized that ZeroMQUtils is no longer present in Spark 2.x. I would appreciate if someone can shed some light on the history and what I could do to use ZeroMQ with Spark

Re: ZeroMQ Streaming in Spark2.x

2017-06-26 Thread Shixiong(Ryan) Zhu
It's moved to http://bahir.apache.org/ You can find document there. On Mon, Jun 26, 2017 at 11:58 AM, Aashish Chaudhary < aashish.chaudh...@kitware.com> wrote: > Hi there, > > I am a beginner when it comes to Spark streaming. I was looking for some > examples related to ZeroMQ and Spark and

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve

2017-06-26 Thread mckunkel
First Spark project. I have a Java method that returns a Dataset. I want to convert this to a Dataset, where the Object is named StatusChangeDB. I have created a POJO StatusChangeDB.java and coded it with all the query objects found in the mySQL table. I then create a Encoder and convert the

Re: Exception which using ReduceByKeyAndWindow in Spark Streaming.

2017-06-26 Thread N B
Hi Swetha, We have dealt with this issue a couple years ago and have solved it. The key insight here was that adding to a HashSet and removing from a HashSet are actually not inverse operations of each other. For example, if you added a key K1 in batch1 and then again added that same key K1

Re: Spark Streaming reduceByKeyAndWindow with inverse function seems to iterate over all the keys in the window even though they are not present in the current batch

2017-06-26 Thread Tathagata Das
Unfortunately the way reduceByKeyAndWindow is implemented, it does iterate through all the counts. To have something more efficient, you may have to implement your own windowing logic using mapWithState. Something like eventDStream.flatmap { event => // find the windows each even maps to, and

Spark Streaming reduceByKeyAndWindow with inverse function seems to iterate over all the keys in the window even though they are not present in the current batch

2017-06-26 Thread SRK
Hi, We have reduceByKeyAndWindow with inverse function feature in our Streaming job to calculate rolling counts for the past hour and for the past 24 hours. It seems that the functionality is iterating over all the keys in the window even though they are not present in the current batch causing

Re: Spark Streaming reduceByKeyAndWindow with inverse function seems toiterate over all the keys in the window even though they are not presentin the current batch

2017-06-26 Thread ??????????
Hi SRK, what is the slideduration and parentduration in your code please? you can search "issue about the windows slice of stream" in the maillist. Perhaps they are related. ---Original--- From: "SRK" Date: 2017/6/27 03:53:22 To: "user";

Question about Parallel Stages in Spark

2017-06-26 Thread satishl
For the below code, since rdd1 and rdd2 dont depend on each other - i was expecting that both first and second printlns would be interwoven. However - the spark job runs all "first " statements first and then all "seocnd" statements next in serial fashion. I have set spark.scheduler.mode = FAIR.

Re: issue about the windows slice of stream

2017-06-26 Thread ??????????
Hi Owen, Would you like help me check this issue please? Is it a potential bug please or not? thanks Fei Shao ---Original--- From: "??"<1427357...@qq.com> Date: 2017/6/25 21:44:41 To: "user";"dev"; Subject: Re: issue about the

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-26 Thread ??????????
Hi Kodali, I feel puzzled about the "Kafka Streaming can indeed do map, reduce, join and window operations ". Do you mean Kafka have API like map or Kafka do't have API but Kafka can do it please? In my memory, kafka do not have API like map and so on. ---Original--- From: "kant

Re: how to mention others in JIRA comment please?

2017-06-26 Thread ??????????
thank you?9?9 ---Original--- From: "Ted Yu" Date: 2017/6/27 10:18:18 To: "??"<1427357...@qq.com>; Cc: "user";"dev"; Subject: Re: how to mention others in JIRA comment please? You can find the JIRA handle of the person

Re: Question about Parallel Stages in Spark

2017-06-26 Thread ??????????
I think the spark cluster receives two submits, A and B. The FAIR is used to schedule A and B. I am not sure about this. ---Original--- From: "Bryan Jeffrey" Date: 2017/6/27 08:55:42 To: "satishl"; Cc: "user"; Subject:

Re: Question about Parallel Stages in Spark

2017-06-26 Thread Pralabh Kumar
Hi I don't think so spark submit ,will receive two submits . Its will execute one submit and then to next one . If the application is multithreaded ,and two threads are calling spark submit and one time , then they will run parallel provided the scheduler is FAIR and task slots are available .

Re: Question about Parallel Stages in Spark

2017-06-26 Thread ??????????
My words cause misunderstanding. Step 1:A is submited to spark. Step 2:B is submitted to spark. Spark gets two independent jobs.The FAIR is used to schedule A and B. Jeffrey' code did not cause two submit. ---Original--- From: "Pralabh Kumar" Date: 2017/6/27

Re: Question about Parallel Stages in Spark

2017-06-26 Thread Pralabh Kumar
i think my words also misunderstood. My point is they will not submit together since they are the part of one thread. val spark = SparkSession.builder() .appName("practice") .config("spark.scheduler.mode","FAIR") .enableHiveSupport().getOrCreate() val sc = spark.sparkContext

how to mention others in JIRA comment please?

2017-06-26 Thread ??????????
Hi all, how to mention others in JIRA comment please? I added @ before other members' name, but it didn't work. Would you like help me please? thanks Fei Shao

Re: how to mention others in JIRA comment please?

2017-06-26 Thread Ted Yu
You can find the JIRA handle of the person you want to mention by going to a JIRA where that person has commented. e.g. you want to find the handle for Joseph. You can go to: https://issues.apache.org/jira/browse/SPARK-6635 and click on his name in comment:

Re: Question about Parallel Stages in Spark

2017-06-26 Thread Bryan Jeffrey
Hello. The driver is running the individual operations in series, but each operation is parallelized internally. If you want them run in parallel you need to provide the driver a mechanism to thread the job scheduling out: val rdd1 = sc.parallelize(1 to 10) val rdd2 = sc.parallelize(1 to

Re: ZeroMQ Streaming in Spark2.x

2017-06-26 Thread Aashish Chaudhary
Thanks. I saw it earlier but did not whether this is the official way of doing Spark with ZeroMQ. Thanks, I will have a look. - Aashish On Mon, Jun 26, 2017 at 3:01 PM Shixiong(Ryan) Zhu wrote: > It's moved to http://bahir.apache.org/ > > You can find document there. >

Saving RDD as Kryo (broken in 2.1)

2017-06-26 Thread Александр Крашенинников
Hi, all! I have a code, serializing RDD as Kryo, and saving it as sequence file. It works fine in 1.5.1, but, while switching to 2.1.1 it does not work. I am trying to serialize RDD of Tuple2<> (got from PairRDD). 1. RDD consists of different heterogeneous objects (aggregates, like HLL,

Fwd: Saving RDD as Kryo (broken in 2.1)

2017-06-26 Thread Alexander Krasheninnikov
Hi, all! I have a code, serializing RDD as Kryo, and saving it as sequence file. It works fine in 1.5.1, but, while switching to 2.1.1 it does not work. I am trying to serialize RDD of Tuple2<> (got from PairRDD). 1. RDD consists of different heterogeneous objects (aggregates, like HLL,

Re: Question on Spark code

2017-06-26 Thread Steve Loughran
On 25 Jun 2017, at 20:57, kant kodali > wrote: impressive! I need to learn more about scala. What I mean stripping away conditional check in Java is this. static final boolean isLogInfoEnabled = false; public void logMessage(String message) {