[
https://issues.apache.org/jira/browse/PIG-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929689#comment-15929689
]
Adam Szita commented on PIG-5186:
---------------------------------
Aggregate warnings were not supported in Spark mode yet (hence the e2e Warning
test case failures). I aim to enable this now.
In MR/Tez we use counters, and in Spark we rely on Accumulators (a means to
support distributed counters).
Pig has some builtin warning enums in PigWarning, and also supports custom
warnings for user defined functions.
This latter is problematic with Spark because you cannot register new
accumulators on the backend and read their values later in the driver.
A workaround has been implemented in my patch [^PIG-5186.0.patch] whereas we
define Map type of Accumulators (beside the Long type we already use). One for
the builtin warnings, one for the custom ones. These are passed from driver to
backend, where the executors can create entries in the maps or increment
preexisting values.
[~kellyzly], [~nkollar] please take look and let me know what you think.
> Support aggregate warnings with Spark engine
> --------------------------------------------
>
> Key: PIG-5186
> URL: https://issues.apache.org/jira/browse/PIG-5186
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Adam Szita
> Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5186.0.patch
>
>
> Looks like we don't get aggregate warning stats when using Spark as exec
> engine:
> {code}
> ./test_harness.pl::TestDriverPig::compareScript INFO Check failed: regex
> match of <Encountered Warning DIVIDE_BY_ZERO 2387 time.*> expected in stderr
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)