Harish has done some good work for popular use-case of windowing on https://issues.apache.org/jira/browse/HIVE-7062 which are available from 0.14 onwards. Will that be useful in your scenario? Or, are you targeting non-windowing PTFs?
Thanks, Ashutosh On Thu, May 7, 2015 at 6:43 AM, Sivaramakrishnan Narayanan < tarb...@gmail.com> wrote: > Hi, > > I was reading through the PTFOperator and related code and was wondering if > there is an opportunity to optimize this function in > WindowingTableFunction.java > > public void execute(PTFPartitionIterator<Object> pItr, PTFPartition > outP) throws HiveException { > > This guy iterates over the input partition once to compute outputColumns. > This causes a full read of input partition. > > It then iterates over input partition again to append newly computed > values. This causes another read of input partition and a write to output > partition. > > I was wondering if it may be more efficient to append to the output > partition as soon as window expressions have been computed. This will avoid > one scan of the input partition. > > FYI - I've been looking at hive 0.13 code mostly but a glance at trunk > suggests this logic is the same there. > > Thanks, > > Siva >