Hi Spark gurus,
I was surprised to read here:
https://stackoverflow.com/questions/50129411/why-is-predicate-pushdown-not-used-in-typed-dataset-api-vs-untyped-dataframe-ap
that filters are not pushed down in typed Datasets and one should rather
stick to Dataframes.
But writing code for
Dear Spark gurus,
*Question*: what way would you recommend to shape a library of custom
transformations for Dataframes/Datasets?
*Details*: e.g., consider we need several custom transformations over the
Dataset/Dataframe instances. For example injecting columns, apply specially
tuned row
hi all
a short example before the long story:
var accumulatedDataFrame = ... // initialize
for (i <- 1 to 100) {
val myTinyNewData = ... // my slowly calculated new data portion in
tiny amounts
accumulatedDataFrame = accumulatedDataFrame.union(myTinyNewData)
// how to stick
Hi all
I experience a strange thing: when Spark 2.3.0 calculations started from
Zeppelin 0.7.3 are finished, the "VCores Used" value in resource manager
stays at its maximum, albeit nothing is assumed to be calculated anymore.
How come?
if relevant, I experience this issue since AWS EMR 5.13.0