[
https://issues.apache.org/jira/browse/AVRO-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019825#comment-14019825
]
Steven Willis commented on AVRO-1130:
-------------------------------------
I'd very much like this as well. How would you imagine the output would be
structured? With a normal {{SortedKeyValueFile}} you've got a single directory
containing exactly two files {{data}} and {{index}}. With a mapreduce that has
multiple reducers I wonder how this should look.
Maybe:
{noformat}
output_path/data-part-00000
output_path/data-part-00001
output_path/data-part-00002
output_path/index-part-00000
output_path/index-part-00001
output_path/index-part-00002
{noformat}
But then if you wanted to treat {{output_path}} as a {{SortedKeyValueFile}},
you'd have to modify the code to allow for multiple data and index files.
Perhaps any directory containing exactly the same number of {{data*}} and
{{index*}} files can be treated as a {{SKVF}} as long as the trailing portion
of each {{data}} filename matched an {{index}} filename.
Or would something like this be better:
{noformat}
output_path/part-00000/data
output_path/part-00000/index
output_path/part-00001/data
output_path/part-00001/index
output_path/part-00002/data
output_path/part-00002/index
{noformat}
That way, each part is a {{SKVF}} and works with the existing code. But then
you wouldn't be able to treat {{output_path}} as a {{SKVF}}. Maybe the new
{{SKVFInputFormat}} would allow for the input path to be either an {{SKVF}}
directory, or a directory containing {{SKVF}} directories.
I think I'd lean towards the first approach myself.
> MapReduce Jobs can output write SortedKeyValueFiles directly
> ------------------------------------------------------------
>
> Key: AVRO-1130
> URL: https://issues.apache.org/jira/browse/AVRO-1130
> Project: Avro
> Issue Type: New Feature
> Components: java
> Affects Versions: 1.7.1
> Reporter: Jeremy Lewi
> Assignee: Harsh J
> Priority: Minor
>
> It would be nice if MapReduce jobs could write directly to
> SortedKeyValueFile's.
> harsh@'s response on this thread http://goo.gl/OT1rN for some more
> information on what needs to be done.
--
This message was sent by Atlassian JIRA
(v6.2#6252)