pig-user  

Optimization question

Vincent Barat
Thu, 18 Mar 2010 15:23:51 -0700

Hi,

I wonder if it is faster to firstly extract only the interesting fiels from a bag of tuples before performing other operations on it, or if it is automatically handled by the optimizer:

For exemple, is:

ssessions = FOREACH sessions GENERATE imei;
imei_sessions = GROUP ssessions BY imei;
imei_session_count = FOREACH imei_sessions GENERATE group, COUNT(ssessions);

faster than:

imei_sessions = GROUP sessions BY imei;
imei_session_count = FOREACH imei_sessions GENERATE group, COUNT(sessions);

Thanks for your help