[
https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Dai updated HBASE-6800:
-----------------------------
Description:
In the last couple of years, increasingly more people begin to stream data into
HBase in near time, and
use high level queries (e.g., Hive) to analyze the data in HBase directly.
While HBase already has very effective MapReduce integration with its good
scanning performance, query processing using MapReduce on HBase still has
significant gaps compared to HDFS: ~3x space overheads and 3~5x performance
overheads according to our measurement.
We propose to implement a document store on HBase, which can greatly improve
query processing on HBase (by leveraging the relational model and read-mostly
access patterns). According to our prototype, it can reduce space usage by
up-to ~3x and speedup query processing by up-to ~1.8x.
was:
In the last couple of years, increasingly more people begin to stream data into
HBase in near time, and
use high level queries (e.g., Hive) to analyze the data in HBase directly.
While HBase already has very effective MapReduce integration with its good
scanning performance, query processing using MapReduce on HBase still has
significant gaps compared to HDFS: ~3x space overheads and 3~5x performance
overheads according to our measurement.
We propose to implement a document store on HBase, which can greatly improve
query processing on HBase (by leveraging the relational model and read-mostly
access patterns). According to our prototype, it can reduce space usage by
up-to ~3x and speedup query processing by up-to ~2x.
> Build a Document Store on HBase for Better Query Processing
> -----------------------------------------------------------
>
> Key: HBASE-6800
> URL: https://issues.apache.org/jira/browse/HBASE-6800
> Project: HBase
> Issue Type: New Feature
> Components: coprocessors, performance
> Affects Versions: 0.96.0
> Reporter: Jason Dai
> Attachments: dot-deisgn.pdf
>
>
> In the last couple of years, increasingly more people begin to stream data
> into HBase in near time, and
> use high level queries (e.g., Hive) to analyze the data in HBase directly.
> While HBase already has very effective MapReduce integration with its good
> scanning performance, query processing using MapReduce on HBase still has
> significant gaps compared to HDFS: ~3x space overheads and 3~5x performance
> overheads according to our measurement.
> We propose to implement a document store on HBase, which can greatly improve
> query processing on HBase (by leveraging the relational model and read-mostly
> access patterns). According to our prototype, it can reduce space usage by
> up-to ~3x and speedup query processing by up-to ~1.8x.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira