argho chatterjee created PIG-4671:
-------------------------------------
Summary: Use MR Native Execution Optimally
Key: PIG-4671
URL: https://issues.apache.org/jira/browse/PIG-4671
Project: Pig
Issue Type: Improvement
Reporter: argho chatterjee
Hi Team,
With respect to the feature development here
https://issues.apache.org/jira/browse/PIG-506
I tried the above approach, but PIG forces to store the data in one directory
and then load the data from this directory.
Eg :
A = Load ....
B = MAPREDUCE 'SomeJar.jar' Store A into input Load Output as ...
Here we are loading and simply storing it back again for the Map-reduce Job To
work.
I do not think this is Optimized way because, suppose, I have implemented my
Own Pig-Readers which is smart, and now I want to load Data Using this
Pig-Reader into my Mapreduce job, then It will load using this reader and then
Store it in some Directory from where the MR will take it as Input. (Lot of
Unnecessary IO where I could directly fed the data From my Custom Pig Loader to
my MapReduce Job)
Can There be a way where the data loaded by A can be directly Fed to the
Map-reduce Job.???!!!
That is, If required, We shall Implement some Readers Of Pig In our MR jobs and
then Use it to Read Data into MR.
We have implemented some Smart Pig Readers , I want to use them in my map
reduce and not use the native MR readers.
Please have a look at this case scenario.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)