[ 
https://issues.apache.org/jira/browse/PIG-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929689#comment-15929689
 ] 

Adam Szita commented on PIG-5186:
---------------------------------

Aggregate warnings were not supported in Spark mode yet (hence the e2e Warning 
test case failures). I aim to enable this now.

In MR/Tez we use counters, and in Spark we rely on Accumulators (a means to 
support distributed counters).
Pig has some builtin warning enums in PigWarning, and also supports custom 
warnings for user defined functions.
This latter is problematic with Spark because you cannot register new 
accumulators on the backend and read their values later in the driver.

A workaround has been implemented in my patch [^PIG-5186.0.patch] whereas we 
define Map type of Accumulators (beside the Long type we already use). One for 
the builtin warnings, one for the custom ones. These are passed from driver to 
backend, where the executors can create entries in the maps or increment 
preexisting values.

[~kellyzly], [~nkollar] please take look and let me know what you think.

> Support aggregate warnings with Spark engine
> --------------------------------------------
>
>                 Key: PIG-5186
>                 URL: https://issues.apache.org/jira/browse/PIG-5186
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>             Fix For: spark-branch
>
>         Attachments: PIG-5186.0.patch
>
>
> Looks like we don't get aggregate warning stats when using Spark as exec 
> engine:
> {code}
> ./test_harness.pl::TestDriverPig::compareScript INFO Check failed: regex 
> match of <Encountered Warning DIVIDE_BY_ZERO 2387 time.*> expected in stderr
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to