Provide a way to open and read a side file using an existing InputFormat
------------------------------------------------------------------------
Key: MAPREDUCE-1130
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1130
Project: Hadoop Map/Reduce
Issue Type: New Feature
Reporter: Pradeep Kamath
In the Pig subproject there is a need to open a side file for implementing map
side joins. In some cases, the entire file needs to be read as a side file and
in some cases, there is a need to read a file beginning from a particular split
to the last split. In order to use existing InputFormats to achieve this, the
pig code would need to mimic hadoop in terms of calling InputFormat.getSplits
and then for each split call InputFormat.createRecordReader,
RecordReader.initialize() and then call RecordReader.nextKey() repeatedly till
we reach end of split - and then continue to the next split. It would be good
if there are some utility methods in Hadoop to achieve this - to read the file
partially to the end or entirely to the end.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.