[jira] [Created] (PIG-4671) Use MR Native Execution Optimally

argho chatterjee (JIRA) Mon, 07 Sep 2015 08:27:10 -0700

argho chatterjee created PIG-4671:
-------------------------------------

             Summary: Use MR Native Execution Optimally
                 Key: PIG-4671
                 URL: https://issues.apache.org/jira/browse/PIG-4671
             Project: Pig
          Issue Type: Improvement
            Reporter: argho chatterjee



Hi Team,
With respect to the feature development here 
https://issues.apache.org/jira/browse/PIG-506

I tried the above approach, but PIG forces to store the data in one directory 
and then load the data from this directory. 
Eg :
A = Load ....
B = MAPREDUCE 'SomeJar.jar' Store A into input Load Output as ...

Here we are loading and simply storing it back again for the Map-reduce Job To 
work.

I do not think this is Optimized way because, suppose, I have implemented my 
Own Pig-Readers which is smart, and now I want to load Data Using this 
Pig-Reader into my Mapreduce job, then It will load using this reader and then 
Store it in some Directory from where the MR will take it as Input. (Lot of 
Unnecessary IO where I could directly fed the data From my Custom Pig Loader to 
my MapReduce Job) 

Can There be a way where the data loaded by A can be directly Fed to the 
Map-reduce Job.???!!!

That is, If required, We shall Implement some Readers Of Pig In our MR jobs and 
then Use it to Read Data into MR.
We have implemented some Smart Pig Readers , I want to use them in my map 
reduce and not use the native MR readers.

Please have a look at this case scenario.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PIG-4671) Use MR Native Execution Optimally

Reply via email to