Antonio Murgia created SPARK-11350:
--------------------------------------
Summary: There is no best practice to handle warnings or messages
produced by Executors in a distributed manner
Key: SPARK-11350
URL: https://issues.apache.org/jira/browse/SPARK-11350
Project: Spark
Issue Type: Wish
Components: Spark Core
Reporter: Antonio Murgia
I looked around on the web and I couldn’t find any way to deal, in a
distributed way with malformed/faulty records during computation. All I was
able to find was the flatMap/Some/None technique + logging.
I’m facing this problem because I have a processing algorithm that extracts
more than one value from each record, but can fail in extracting one of those
multiple values, and I want to keep track of them. Logging is not feasible
because this “warning” happens so frequently that the logs would become
overwhelming and impossibile to read.
Since I have 3 different possible outcomes from my processing I modeled it with
this class hierarchy:
http://i.imgur.com/NIesYUm.png?1
That holds result and/or warnings. Since Result implements Traversable it can
be used in a flatMap, discarding all warnings and failure results, in the other
hand, if we want to keep track of warnings, we can elaborate them and output
them if we need.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]