Hi all,
I found an issue about the windows slice of dstream.
My code is :
ssc = new StreamingContext( conf, Seconds(1))
val content = ssc.socketTextStream('ip','port')
content.countByValueAndWindow( Seconds(2), Seconds(8)).foreach( println())
The key is that slide is greater than windows.
I am new to Spark and i need some guidance on how to fetch files from
--files option on Spark-Submit.
I read on some forums that we can fetch the files from
Spark.getFiles(fileName) and can use it in our code and all nodes should
read it.
But i am facing some issue
Below is the command i am
Hi Sudhir,
I believe you have to use a shared file system that is accused by all nodes.
> On Jun 24, 2017, at 1:30 PM, sudhir k wrote:
>
>
> I am new to Spark and i need some guidance on how to fetch files from --files
> option on Spark-Submit.
>
> I read on some
Neither of your code examples invoke a repartitioning. Add in a repartition
command.
On Sat, Jun 24, 2017, 11:53 AM Vikash Pareek
wrote:
> Hi Vadim,
>
> Thank you for your response.
>
> I would like to know how partitioner choose the key, If we look at my
>
addFile is supposed to not depend on a shared FS unless the semantics have
changed recently.
On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri
wrote:
> Hi Sudhir,
>
> I believe you have to use a shared file system that is accused by all
> nodes.
>
>
> On Jun 24, 2017, at
Dataset/DataFrame has repartition (which can be used to partition by key)
and sortWithinPartitions.
see for example usage here:
https://github.com/tresata/spark-sorted/blob/master/src/main/scala/com/tresata/spark/sorted/sql/GroupSortedDataset.scala#L18
On Fri, Jun 23, 2017 at 5:43 PM, Keith
Unsubscribe
Sent from my iPhone
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Thanks for the pointer Saliya, I'm looking got an equivalent api in
dataset/dataframe for repartitionAndSortWithinPartitions, I've already
converted most of the RDD's to Dataframes.
Regards,
Keith.
http://keith-chapman.com
On Sat, Jun 24, 2017 at 3:48 AM, Saliya Ekanayake
Hi Nguyen,
This looks promising and seems like I could achieve it using cluster by.
Thanks for the pointer.
Regards,
Keith.
http://keith-chapman.com
On Sat, Jun 24, 2017 at 5:27 AM, nguyen duc Tuan
wrote:
> Hi Chapman,
> You can use "cluster by" to do what you want.
>
Hi Vadim,
Thank you for your response.
I would like to know how partitioner choose the key, If we look at my
example then following question arises:
1. In case of rdd1, hash partitioning should calculate hashcode of key
(i.e. *"aa"* in this case), so *all records should go to single partition*
I haven't worked with datasets but would this help
https://stackoverflow.com/questions/37513667/how-to-create-a-spark-dataset-from-an-rdd
?
On Jun 23, 2017 5:43 PM, "Keith Chapman" wrote:
> Hi,
>
> I have code that does the following using RDDs,
>
> val
Hi Chapman,
You can use "cluster by" to do what you want.
https://deepsense.io/optimize-spark-with-distribute-by-and-cluster-by/
2017-06-24 17:48 GMT+07:00 Saliya Ekanayake :
> I haven't worked with datasets but would this help https://stackoverflow.
>
12 matches
Mail list logo