[jira] [Commented] (HBASE-6800) Build a Document Store on HBase for Better Query Processing

Andrew Purtell (JIRA) Mon, 17 Sep 2012 12:21:11 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457242#comment-13457242
 ]


Andrew Purtell commented on HBASE-6800:
---------------------------------------

Thank you for your interest in contributing to the HBase project. I have two 
initial comments/suggestions:

1) From the attached document, it appears that the existing coprocessor 
framework was sufficient for the implementation of the DOT system on top, which 
is great to see. There has been some discussion in the HBase PMC, documented in 
the archives of the [email protected] mailing list, that coprocessor based 
applications should begin as independent code contributions, perhaps hosted in 
a GitHub repository. In your announcement on general@ I see you have sort-of 
done this already at: https://github.com/intel-hadoop/hbase-0.94-panthera , 
except this is a full fork of the HBase source tree with all history of 
individual changes lost (a single commit of a source drop). It would be helpful 
if only the changes on top of stock HBase code appear here. Otherwise, what you 
have done is in effect forked the HBase project, which is not conducive to 
contribution. 

2) From the design document: "The co-processor framework needs to be extended 
to provide observers for the filter operations, similar to the observers of the 
data access operations." We would be delighted to work with you on the 
necessary coprocessor framework extensions. I'd recommend a separate JIRA 
specifically for this. Let's discuss what Coprocessor API extensions or 
additions are necessary. Do you have a proposal?

                
> Build a Document Store on HBase for Better Query Processing
> -----------------------------------------------------------
>
>                 Key: HBASE-6800
>                 URL: https://issues.apache.org/jira/browse/HBASE-6800
>             Project: HBase
>          Issue Type: New Feature
>          Components: coprocessors, performance
>    Affects Versions: 0.96.0
>            Reporter: Jason Dai
>         Attachments: dot-deisgn.pdf
>
>
> In the last couple of years, increasingly more people begin to stream data 
> into HBase in near time, and 
> use high level queries (e.g., Hive) to analyze the data in HBase directly. 
> While HBase already has very effective MapReduce integration with its good 
> scanning performance, query processing using MapReduce on HBase still has 
> significant gaps compared to HDFS: ~3x space overheads and 3~5x performance 
> overheads according to our measurement.
> We propose to implement a document store on HBase, which can greatly improve 
> query processing on HBase (by leveraging the relational model and read-mostly 
> access patterns). According to our prototype, it can reduce space usage by 
> up-to ~3x and speedup query processing by up-to ~1.8x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6800) Build a Document Store on HBase for Better Query Processing

Reply via email to