[ 
https://issues.apache.org/jira/browse/HADOOP-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Rabe updated HADOOP-11561:
-------------------------------
    Labels: composite  (was: mapfile)
    Status: Patch Available  (was: Open)

See [^HADOOP-11561.patch]. I added an inner class "Reader" to the 
CompositeInputFormat which can be used to read and join multiple files on the 
fly when reading data in a client application.

The same constraints as for the {{CompositeInputFormat}} apply as this reader 
uses the format internally. To use the reader, do the following:
# Make sure all input files have the same key and value classes
# Make sure all records in the input files are sorted by the same sorting 
criterion
# Prepare a {{Configuration}} object with at least {{mapreduce.join.expr}} and 
{{mapreduce.join.comparator}} set. Consult {{CompositeInputFormat}} for details.
# Use the constructor {{CompositeInputFormat.Reader(Configuration)}} to 
instantiate the reader
# Use its {{nextKeyValue}} to read a record and store it in the writables you 
supplied, just like it is done with the {{SequenceFile}} and {{MapFile}} 
readers.

> It should be possible to chain-load multiple MapFiles on the fly and read the 
> records in an ascending order
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11561
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11561
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Jens Rabe
>            Assignee: Jens Rabe
>            Priority: Minor
>              Labels: composite
>         Attachments: HADOOP-11561.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> In a scenario where there are many MapFiles which all share the same 
> key/value types, e.g., when dealing with measured data from sensors, it 
> should be possible to chain-load multiple MapFiles. That means, there should 
> be a reader which can be supplied with one or more directories containing 
> MapFiles, and it should be possible to read the records of all files in order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to