Hi:
I am working with DataSets so that I can use mapGroupsWithState for business
logic and then use dropDuplicates over a set of fields. I would like to use
the withWatermark so that I can restrict the how much state is stored.
>From the API it looks like withWatermark takes a string -
Thanks Eyal - it appears that these are the same patterns used for spark
DStreams.
On Wednesday, December 27, 2017 1:15 AM, Eyal Zituny
wrote:
Hiif you're interested in stopping you're spark application externally, you
will probably need a way to communicate
I have been searching for examples, but not finding exactly what I need.
I am looking for the paradigm for using spark 2.2 to convert a bunch of
binary files into a bunch of different binary files. I'm starting with:
val files = spark.sparkContext.binaryFiles("hdfs://1.2.3.4/input")
then
Hi,
Please try to use the SPARK UI from the way that AWS EMR recommends, it
should be available from the resource manager. I never ever had any problem
working with it. THAT HAS ALWAYS BEEN MY PRIMARY AND SOLE SOURCE OF
DEBUGGING.
Sadly, I cannot be of much help unless we go for a screen share
Thanks. You are right on both counts
1. Doing max(instnc_id) over () works. I thought that Spark would
automatically treat max(instnc_id) as max(instnc_id) over ()
2. Spark tries to do max function in one task, and it runs out of memory
I’ll revert back to join. Thanks again
From: