[jira] [Commented] (BLUR-18) Rework the MapReduce Library to implement Input/OutputFromats

Aaron McCurry (JIRA) Fri, 12 Oct 2012 18:03:05 -0700

    [ 
https://issues.apache.org/jira/browse/BLUR-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475477#comment-13475477
 ]


Aaron McCurry commented on BLUR-18:
-----------------------------------

I haven't had time to create a formal document but I will layout my thoughts 
here.

For reference checkout branch "new-api-prototype" and take a look at the 
"blur-new-api-prototype" project.  The thrift api in this project which is 
defined in service.thrift provides a much a simpler api than the current 
Blur.thrift definition.

I see 3 overall components for the MapReduce project.

1.  The InputFormat (both old and new hadoop api) should read from the 
service.thrift API.  This will allow for bulk pull of data as well as executing 
queries and iterating over the entire result set.  As for the InputSplits, they 
should operate per shard of the table, and the shard server definition is in 
it's infancy of being defined (see the BlurShard service in the service.thrift 
definition).

2.  Bulk OutputFormat (both old and new hadoop api) should write the contains 
of the indexes directly to the HDFS file system and bypass the thrift server.

3.  Standard OutputFormat (both old and new hadoop api) should write mutates to 
the shard servers.

I am going to create 3 sub tasks that way we can discuss each of the MR 
components separately.


                
> Rework the MapReduce Library to implement Input/OutputFromats
> -------------------------------------------------------------
>
>                 Key: BLUR-18
>                 URL: https://issues.apache.org/jira/browse/BLUR-18
>             Project: Apache Blur
>          Issue Type: Improvement
>            Reporter: Aaron McCurry
>
> Currently the only way to implement indexing is to use the BlurReducer.  A 
> better way to implement this would be to support Hadoop input/outputformats 
> in both the new and old api's.  This would allow an easier integration with 
> other Hadoop projects such as Hive and Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BLUR-18) Rework the MapReduce Library to implement Input/OutputFromats

Reply via email to