[
https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457527#comment-13457527
]
Ted Yu commented on HBASE-6800:
-------------------------------
@Jason:
You raised some interesting questions.
I think you may be aware of the modularization effort in trunk. Matt Corgan is
submitting his contribution as a separate module.
This model may be the answer to some of your questions.
> Build a Document Store on HBase for Better Query Processing
> -----------------------------------------------------------
>
> Key: HBASE-6800
> URL: https://issues.apache.org/jira/browse/HBASE-6800
> Project: HBase
> Issue Type: New Feature
> Components: coprocessors, performance
> Affects Versions: 0.96.0
> Reporter: Jason Dai
> Attachments: dot-deisgn.pdf
>
>
> In the last couple of years, increasingly more people begin to stream data
> into HBase in near time, and
> use high level queries (e.g., Hive) to analyze the data in HBase directly.
> While HBase already has very effective MapReduce integration with its good
> scanning performance, query processing using MapReduce on HBase still has
> significant gaps compared to HDFS: ~3x space overheads and 3~5x performance
> overheads according to our measurement.
> We propose to implement a document store on HBase, which can greatly improve
> query processing on HBase (by leveraging the relational model and read-mostly
> access patterns). According to our prototype, it can reduce space usage by
> up-to ~3x and speedup query processing by up-to ~1.8x.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira