[ 
https://issues.apache.org/jira/browse/PIG-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456845#comment-13456845
 ] 

Prasanth J commented on PIG-2831:
---------------------------------

Updated the patch with the following changes
1) TupleFieldFilter function now implementes guava generic Function. Also this 
function now accepts a mask, based on which the tuple fields will be filtered 
when getNext() is called.
2) Fixes issues with Filter followed Cube incase of mrcube. 
3) The basic version of mrcube supports only filtering operation ahead of Cube 
operator. If any other operator (blocking operator as well) is used it will 
fallback to naive cubing instead of mr-cubing. This can be enhanced further 
once this patch is stabilized. One more optimization can be explored when using 
blocking operators ahead of cube operator, the force loading of input data for 
statistics gathering can be removed if the blocking operator uses 
RichInterStorage instead of InterStorage. 
                
> MR-Cube implementation (Distributed cubing for holistic measures)
> -----------------------------------------------------------------
>
>                 Key: PIG-2831
>                 URL: https://issues.apache.org/jira/browse/PIG-2831
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>         Attachments: PIG-2831.1.git.patch, PIG-2831.2.git.patch, 
> PIG-2831.3.git.patch, PIG-2831.4.git.patch, PIG-2831.5.git.patch, 
> PIG-2831.6.git.patch, PIG-2831.7.git.patch, PIG-2831.8.git.patch
>
>
> Implementing distributed cube materialization on holistic measure based on 
> MR-Cube approach as described in http://arnab.org/files/mrcube.pdf. 
> Primary steps involved:
> 1) Identify if the measure is holistic or not
> 2) Determine algebraic attribute (can be detected automatically for few 
> cases, if automatic detection fails user should hint the algebraic attribute)
> 3) Modify MRPlan to insert a sampling job which executes naive cube algorithm 
> and generates annotated cube lattice (contains large group partitioning 
> information)
> 4) Modify plan to distribute annotated cube lattice to all mappers using 
> distributed cache
> 5) Execute actual cube materialization on full dataset
> 6) Modify MRPlan to insert a post process job for combining the results of 
> actual cube materialization job
> 7) OOM exception handling

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to