Agree. Feel free to file a jira for 2.
On Tue, Jun 10, 2014 at 9:09 AM, Abhishek Agarwal <[email protected]> wrote: > 2) is the approach to go for if there is only one filter on the map side. > However, if you have operations, such as flatten or other filters on the > map, you cannot associate the difference between map input and output > records with particular filter operation. > > > On Tue, Jun 10, 2014 at 8:30 PM, Cheolsoo Park <[email protected]> > wrote: > > > 1) Number of invocations of a UDF: You can use pig.udf.profile > > <http://pig.apache.org/docs/r0.12.0/perf.html#profiling>. Note that it > is > > approximation and can be misleading. In fact, you can make it 100% > accurate > > by configuring pig.udf.profile.frequency > > <https://issues.apache.org/jira/browse/PIG-3956>. The latter is only in > > trunk. > > > > 2) Number of records getting filtered: We don't have a counter > > specifically for > > this, but you can guess it by looking at map/reduce input/output records > > before/after the filter-by. If you use a visualization tool such as > > Lipstick, the input/output records of each MR job is displayed in the > DAG. > > > > On Tue, Jun 10, 2014 at 7:49 AM, Abhishek Agarwal <[email protected]> > > wrote: > > > > > I was wondering if pig has in-built support for counting, the number of > > > invocations of a UDF and the number of records getting filtered through > > > FILTER operator. > > > > > > This feature could be very useful especially for filters where you > can't > > > hook your own counters. > > > > > > -- > > > Regards, > > > Abhishek Agarwal > > > > > > > > > -- > Regards, > Abhishek Agarwal >
