[
https://issues.apache.org/jira/browse/PIG-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478141#comment-15478141
]
Rohini Palaniswamy commented on PIG-5019:
-----------------------------------------
[~murshyd],
Thanks for reporting this. We recently encountered this as well.
{code}
// log at least once
if (msgMap.get(o) == null ||
!msgMap.get(o).equals(displayMessage)) {
log.warn(displayMessage);
msgMap.put(o, displayMessage);
}
{code}
Can you get rid of the whole block of logging once above instead of just
logging classname and warning name? It is not useful as that can be anyways
seen from the counters and the pig client log message. Having to do a
map.get() and equals check just for the message will affect performance when
there are huge number of records.
Documented behavior has always been to turn off aggregation with -w/-warning to
see the actual warnings and was the case with older releases. So we are not
losing anything by removing the code.
> Pig generates tons of warnings for udf with enabled warnings aggregation
> ------------------------------------------------------------------------
>
> Key: PIG-5019
> URL: https://issues.apache.org/jira/browse/PIG-5019
> Project: Pig
> Issue Type: Bug
> Components: internal-udfs
> Affects Versions: 0.14.0
> Reporter: Murshid Chalaev
> Assignee: Murshid Chalaev
> Attachments: PIG-5019.patch, input_example.gz, test_pig14_udf .pig
>
>
> For data set containing 9 lines the aggregated warning message is displayed
> {code}
> 2016-09-01 19:40:33,664 [main] WARN
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Encountered Warning UDF_WARNING_1 6 time(s).
> {code}
> but in contained logs I see a separate log message "Cannot
> extract group for input" for every not matching value
> {code}
> 2016-09-01 19:40:28,115 INFO [main]
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map:
> Aliases being processed per job phase (AliasName[line,offset]): M
> : b[10,4],b[-1,-1],extract_fields[17,17] C: R:
> 2016-09-01 19:40:28,122 WARN [main]
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger:
> org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtrac
> t : Cannot extract group for input /v1=1&v3=9
> 2016-09-01 19:40:28,124 WARN [main]
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger:
> org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtrac
> t : Cannot extract group for input /v2=3&v3=7
> 2016-09-01 19:40:28,124 WARN [main]
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger:
> org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot
> extract group for input /v1=4&v3=6
> 2016-09-01 19:40:28,125 WARN [main]
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger:
> org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot
> extract group for input /v2=5&v3=5
> 2016-09-01 19:40:28,125 WARN [main]
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger:
> org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot
> extract group for input /v1=8&v3=2
> 2016-09-01 19:40:28,125 WARN [main]
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger:
> org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot
> extract group for input /v3=9&v2=1
> {code}
> It does not log the warning messages in the task logs.
> The patch for PIG-2207 was committed to
> Pig 0.13+
> In 0.12 we had a single counter for all UDF warnings, but in 0.13+ we have
> separate counter and message for every unique warning log line.
> Two lines below are unique
> /v2=3&v3=7
> /v1=4&v3=6
> That's why Pig print both of them to the console.
> Printing a separate log message for every data line slows down the overall
> performance as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)