[ 
https://issues.apache.org/jira/browse/PIG-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3956:
-------------------------------

    Fix Version/s: 0.14.0
         Assignee: Cheolsoo Park
           Status: Patch Available  (was: Open)

> UDF profile is often misleading
> -------------------------------
>
>                 Key: PIG-3956
>                 URL: https://issues.apache.org/jira/browse/PIG-3956
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-3956-1.patch
>
>
> Pig provides UDF profiling via the {{pig.udf.profile}} perperty. How it works 
> is that it samples every 100th UDF invocation and increments a counter by 100 
> x elapsed time.
> {code}
> private final static int TIMING_FREQ = 100;
> ...
> PigStatusReporter.getInstance().incrCounter(counterGroup, TIMING_COUNTER,
>     Math.round((System.nanoTime() - startNanos) / 1000) * TIMING_FREQ);
> {code}
> The problem is that this assumes every invocation takes about the same time, 
> which doesn't hold true oftentimes.
> Here is an example. TIMING_FREQ = 100.
> {code}
> set pig.udf.profile true;
> set opt.fetch false;
> l = LOAD 'foo' AS (value:chararray);
> f = FOREACH l GENERATE LOWER(value);
> DUMP f;
> {code}
> This job gives me the following counters-
> {code}
> org.apache.pig.builtin.LOWER - approx_invocations: 6,144,600
> org.apache.pig.builtin.LOWER - approx_microsecs: 130,913,800
> {code}
> However, if I change TIMING_FREQ to 1, I get the followings-
> {code}
> org.apache.pig.builtin.LOWER - approx_invocations: 6,144,248
> org.apache.pig.builtin.LOWER - approx_microsecs: 19,711,842
> {code}
> As can be seen, approx_microsecs is 6x larger when sampling every 100th 
> invocation as compared to when sampling every invocation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to