Agree. Feel free to file a jira for 2.

On Tue, Jun 10, 2014 at 9:09 AM, Abhishek Agarwal <[email protected]>
wrote:

> 2) is the approach to go for if there is only one filter on the map side.
> However, if you have operations, such as flatten or other filters on the
> map, you cannot associate the difference between map input and output
> records with particular filter operation.
>
>
> On Tue, Jun 10, 2014 at 8:30 PM, Cheolsoo Park <[email protected]>
> wrote:
>
> > 1) Number of invocations of a UDF: You can use pig.udf.profile
> > <http://pig.apache.org/docs/r0.12.0/perf.html#profiling>. Note that it
> is
> > approximation and can be misleading. In fact, you can make it 100%
> accurate
> > by configuring pig.udf.profile.frequency
> > <https://issues.apache.org/jira/browse/PIG-3956>. The latter is only in
> > trunk.
> >
> > 2) Number of records getting filtered: We don't have a counter
> > specifically for
> > this, but you can guess it by looking at map/reduce input/output records
> > before/after the filter-by. If you use a visualization tool such as
> > Lipstick, the input/output records of each MR job is displayed in the
> DAG.
> >
> > On Tue, Jun 10, 2014 at 7:49 AM, Abhishek Agarwal <[email protected]>
> > wrote:
> >
> > > I was wondering if pig has in-built support for counting, the number of
> > > invocations of a UDF and the number of records getting filtered through
> > > FILTER operator.
> > >
> > > This feature could be very useful especially for filters where you
> can't
> > > hook your own counters.
> > >
> > > --
> > > Regards,
> > > Abhishek Agarwal
> > >
> >
>
>
>
> --
> Regards,
> Abhishek Agarwal
>

Reply via email to