2) is the approach to go for if there is only one filter on the map side. However, if you have operations, such as flatten or other filters on the map, you cannot associate the difference between map input and output records with particular filter operation.
On Tue, Jun 10, 2014 at 8:30 PM, Cheolsoo Park <[email protected]> wrote: > 1) Number of invocations of a UDF: You can use pig.udf.profile > <http://pig.apache.org/docs/r0.12.0/perf.html#profiling>. Note that it is > approximation and can be misleading. In fact, you can make it 100% accurate > by configuring pig.udf.profile.frequency > <https://issues.apache.org/jira/browse/PIG-3956>. The latter is only in > trunk. > > 2) Number of records getting filtered: We don't have a counter > specifically for > this, but you can guess it by looking at map/reduce input/output records > before/after the filter-by. If you use a visualization tool such as > Lipstick, the input/output records of each MR job is displayed in the DAG. > > On Tue, Jun 10, 2014 at 7:49 AM, Abhishek Agarwal <[email protected]> > wrote: > > > I was wondering if pig has in-built support for counting, the number of > > invocations of a UDF and the number of records getting filtered through > > FILTER operator. > > > > This feature could be very useful especially for filters where you can't > > hook your own counters. > > > > -- > > Regards, > > Abhishek Agarwal > > > -- Regards, Abhishek Agarwal
