Hey guys, Looking for a bit of help on logging.
I trying to get Spark to write log4j logs per job within a Spark cluster.
So for example, I'd like:
$SPARK_HOME/logs/job1.log.x
$SPARK_HOME/logs/job2.log.x
And I want this on the driver and on the executor.
I'm trying to accomplish this by using
Hi all,
I'm trying to lock down ALL Spark ports and have tried using
spark-defaults.conf and via the sparkContext. (The example below was run in
local[*] mode, but all attempts to run in local or spark-submit.sh on
cluster via jar all result in the same results).
My goal is to define all
I wanted to post for validation to understand if there is more efficient way
to achieve my goal. I'm currently performing this flow for two distinct
calculations executing in parallel:
1) Sum key/value pair, by using a simple witnessed count(apply 1 to a
mapToPair() and then groupByKey()
2)
Yes, thanks, I did in fact mean reduceByKey(), thus allowing the convenience
method process the summation by key.
Thanks for your feedback!
DH
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Workflow-Validation-tp11677p11706.html
Sent from
Xichen_tju,
I recently evaluated Storm for a period of months (using 2Us, 2.4GHz CPU,
24GBRAM with 3 servers) and was not able to achieve a realistic scale for my
business domain needs. Storm is really only a framework, which allows you to
put in code to do whatever it is you need for a
Hi All,
I was able to resolve this matter with a simple fix. It seems that in order
to process a reduceByKey and the flat map operations at the same time, the
only way to resolve was to increase the number of threads to 1.
Since I'm developing on my personal machine for speed, I simply updated
Hi all, I recently just picked up Spark and am trying to work through a
coding issue that involves the reduceByKey method. After various debugging
efforts, it seems that the reducyByKey method never gets called.
Here's my workflow, which is followed by my code and results:
My parsed data