[
https://issues.apache.org/jira/browse/BLUR-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475477#comment-13475477
]
Aaron McCurry commented on BLUR-18:
-----------------------------------
I haven't had time to create a formal document but I will layout my thoughts
here.
For reference checkout branch "new-api-prototype" and take a look at the
"blur-new-api-prototype" project. The thrift api in this project which is
defined in service.thrift provides a much a simpler api than the current
Blur.thrift definition.
I see 3 overall components for the MapReduce project.
1. The InputFormat (both old and new hadoop api) should read from the
service.thrift API. This will allow for bulk pull of data as well as executing
queries and iterating over the entire result set. As for the InputSplits, they
should operate per shard of the table, and the shard server definition is in
it's infancy of being defined (see the BlurShard service in the service.thrift
definition).
2. Bulk OutputFormat (both old and new hadoop api) should write the contains
of the indexes directly to the HDFS file system and bypass the thrift server.
3. Standard OutputFormat (both old and new hadoop api) should write mutates to
the shard servers.
I am going to create 3 sub tasks that way we can discuss each of the MR
components separately.
> Rework the MapReduce Library to implement Input/OutputFromats
> -------------------------------------------------------------
>
> Key: BLUR-18
> URL: https://issues.apache.org/jira/browse/BLUR-18
> Project: Apache Blur
> Issue Type: Improvement
> Reporter: Aaron McCurry
>
> Currently the only way to implement indexing is to use the BlurReducer. A
> better way to implement this would be to support Hadoop input/outputformats
> in both the new and old api's. This would allow an easier integration with
> other Hadoop projects such as Hive and Pig.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira