issue about the windows slice of stream

2017-06-24 Thread ??????????
Hi all, I found an issue about the windows slice of dstream. My code is : ssc = new StreamingContext( conf, Seconds(1)) val content = ssc.socketTextStream('ip','port') content.countByValueAndWindow( Seconds(2), Seconds(8)).foreach( println()) The key is that slide is greater than windows.

Fwd: Can we access files on Cluster mode

2017-06-24 Thread sudhir k
I am new to Spark and i need some guidance on how to fetch files from --files option on Spark-Submit. I read on some forums that we can fetch the files from Spark.getFiles(fileName) and can use it in our code and all nodes should read it. But i am facing some issue Below is the command i am

Re: Can we access files on Cluster mode

2017-06-24 Thread varma dantuluri
Hi Sudhir, I believe you have to use a shared file system that is accused by all nodes. > On Jun 24, 2017, at 1:30 PM, sudhir k wrote: > > > I am new to Spark and i need some guidance on how to fetch files from --files > option on Spark-Submit. > > I read on some

Re: How does HashPartitioner distribute data in Spark?

2017-06-24 Thread Russell Spitzer
Neither of your code examples invoke a repartitioning. Add in a repartition command. On Sat, Jun 24, 2017, 11:53 AM Vikash Pareek wrote: > Hi Vadim, > > Thank you for your response. > > I would like to know how partitioner choose the key, If we look at my >

Re: Can we access files on Cluster mode

2017-06-24 Thread Holden Karau
addFile is supposed to not depend on a shared FS unless the semantics have changed recently. On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri wrote: > Hi Sudhir, > > I believe you have to use a shared file system that is accused by all > nodes. > > > On Jun 24, 2017, at

Re: Is there an api in Dataset/Dataframe that does repartitionAndSortWithinPartitions?

2017-06-24 Thread Koert Kuipers
Dataset/DataFrame has repartition (which can be used to partition by key) and sortWithinPartitions. see for example usage here: https://github.com/tresata/spark-sorted/blob/master/src/main/scala/com/tresata/spark/sorted/sql/GroupSortedDataset.scala#L18 On Fri, Jun 23, 2017 at 5:43 PM, Keith

Unsubscribe

2017-06-24 Thread Anita Tailor
Unsubscribe Sent from my iPhone - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Is there an api in Dataset/Dataframe that does repartitionAndSortWithinPartitions?

2017-06-24 Thread Keith Chapman
Thanks for the pointer Saliya, I'm looking got an equivalent api in dataset/dataframe for repartitionAndSortWithinPartitions, I've already converted most of the RDD's to Dataframes. Regards, Keith. http://keith-chapman.com On Sat, Jun 24, 2017 at 3:48 AM, Saliya Ekanayake

Re: Is there an api in Dataset/Dataframe that does repartitionAndSortWithinPartitions?

2017-06-24 Thread Keith Chapman
Hi Nguyen, This looks promising and seems like I could achieve it using cluster by. Thanks for the pointer. Regards, Keith. http://keith-chapman.com On Sat, Jun 24, 2017 at 5:27 AM, nguyen duc Tuan wrote: > Hi Chapman, > You can use "cluster by" to do what you want. >

Re: How does HashPartitioner distribute data in Spark?

2017-06-24 Thread Vikash Pareek
Hi Vadim, Thank you for your response. I would like to know how partitioner choose the key, If we look at my example then following question arises: 1. In case of rdd1, hash partitioning should calculate hashcode of key (i.e. *"aa"* in this case), so *all records should go to single partition*

Re: Is there an api in Dataset/Dataframe that does repartitionAndSortWithinPartitions?

2017-06-24 Thread Saliya Ekanayake
I haven't worked with datasets but would this help https://stackoverflow.com/questions/37513667/how-to-create-a-spark-dataset-from-an-rdd ? On Jun 23, 2017 5:43 PM, "Keith Chapman" wrote: > Hi, > > I have code that does the following using RDDs, > > val

Re: Is there an api in Dataset/Dataframe that does repartitionAndSortWithinPartitions?

2017-06-24 Thread nguyen duc Tuan
Hi Chapman, You can use "cluster by" to do what you want. https://deepsense.io/optimize-spark-with-distribute-by-and-cluster-by/ 2017-06-24 17:48 GMT+07:00 Saliya Ekanayake : > I haven't worked with datasets but would this help https://stackoverflow. >