[
https://issues.apache.org/jira/browse/HADOOP-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jens Rabe updated HADOOP-11561:
-------------------------------
Labels: composite (was: mapfile)
Status: Patch Available (was: Open)
See [^HADOOP-11561.patch]. I added an inner class "Reader" to the
CompositeInputFormat which can be used to read and join multiple files on the
fly when reading data in a client application.
The same constraints as for the {{CompositeInputFormat}} apply as this reader
uses the format internally. To use the reader, do the following:
# Make sure all input files have the same key and value classes
# Make sure all records in the input files are sorted by the same sorting
criterion
# Prepare a {{Configuration}} object with at least {{mapreduce.join.expr}} and
{{mapreduce.join.comparator}} set. Consult {{CompositeInputFormat}} for details.
# Use the constructor {{CompositeInputFormat.Reader(Configuration)}} to
instantiate the reader
# Use its {{nextKeyValue}} to read a record and store it in the writables you
supplied, just like it is done with the {{SequenceFile}} and {{MapFile}}
readers.
> It should be possible to chain-load multiple MapFiles on the fly and read the
> records in an ascending order
> -----------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-11561
> URL: https://issues.apache.org/jira/browse/HADOOP-11561
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Jens Rabe
> Assignee: Jens Rabe
> Priority: Minor
> Labels: composite
> Attachments: HADOOP-11561.patch
>
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> In a scenario where there are many MapFiles which all share the same
> key/value types, e.g., when dealing with measured data from sensors, it
> should be possible to chain-load multiple MapFiles. That means, there should
> be a reader which can be supplied with one or more directories containing
> MapFiles, and it should be possible to read the records of all files in order.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)