[jira] [Updated] (PIG-4671) Use MR Native Execution Optimally

argho chatterjee (JIRA) Tue, 08 Sep 2015 05:18:56 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-4671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


argho chatterjee updated PIG-4671:
----------------------------------
    Priority: Critical  (was: Major)

> Use MR Native Execution Optimally
> ---------------------------------
>
>                 Key: PIG-4671
>                 URL: https://issues.apache.org/jira/browse/PIG-4671
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: argho chatterjee
>            Priority: Critical
>
> Hi Team,
> With respect to the feature development here 
> https://issues.apache.org/jira/browse/PIG-506
> I tried the above approach, but PIG forces to store the data in one directory 
> and then load the data from this directory. 
> Eg :
> A = Load ....
> B = MAPREDUCE 'SomeJar.jar' Store A into input Load Output as ...
> Here we are loading and simply storing it back again for the Map-reduce Job 
> To work.
> I do not think this is Optimized way because, suppose, I have implemented my 
> Own Pig-Readers which is smart, and now I want to load Data Using this 
> Pig-Reader into my Mapreduce job, then It will load using this reader and 
> then Store it in some Directory from where the MR will take it as Input. (Lot 
> of Unnecessary IO where I could directly fed the data From my Custom Pig 
> Loader to my MapReduce Job) 
> Can There be a way where the data loaded by A can be directly Fed to the 
> Map-reduce Job.???!!!
> That is, If required, We shall Implement some Readers Of Pig In our MR jobs 
> and then Use it to Read Data into MR.
> We have implemented some Smart Pig Readers , I want to use them in my map 
> reduce and not use the native MR readers.
> Please have a look at this case scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4671) Use MR Native Execution Optimally

Reply via email to