[
https://issues.apache.org/jira/browse/HBASE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack resolved HBASE-2037.
--------------------------
Resolution: Fixed
Committed to branch. Opened another issue to apply to TRUNK.
> Alternate indexed hbase implementation; speeds scans by adding indexes to
> regions rather secondary tables
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-2037
> URL: https://issues.apache.org/jira/browse/HBASE-2037
> Project: Hadoop HBase
> Issue Type: New Feature
> Reporter: stack
> Fix For: 0.20.3
>
> Attachments: idx-hbase2.patch, idx-hbase3.patch, index.html
>
>
> Purpose
> The goal of the indexed HBase contrib is to speed up scans by indexing HBase
> columns. Indexed HBase (IHbase) is different from the indexed tables in
> transactional HBase (ITHbase): while the indexes in ITHBase are, in fact,
> hbase tables using the indexed column's values as row keys, IHbase creates
> indexes at the region level. The differences are summarized in below.
> + global ordering
> ITHBase: yes
> IHBase: no
> Comment: IHBase has an index for each region. The flip side of not having
> global ordering is compatibility with the good old HRegion: results are
> coming back in row order (and not value order as in THBase)
> + Full table scan?
> ITHBase: no
> IHBase: no
> Comment: ITHbase does a partial scan on the index table. IHbase supports
> specifying start/end rows to limit the number of scanned regions
> + Multiple Index Usage
> ITHBase: no
> IHBase: yes
> Comment: IHBase can take advantage of multiple indexes in the same scan.
> IHBase IdxScan object accepts an Expression which allows intersection/ unison
> of several indexed
> column criteria
> + Extra disk storage
> ITHBase: yes
> IHBase: no
> Comment: IHbase indexes are created when the region starts/flushes and do not
> require any extra storage
> + Extra RAM
> ITHBase: yes
> IHBase: yes
> Comment: IHbase indexes are in memory and hence increase the memory overhead.
> THbase indexes increase the number of regions each region server has to
> support thus costing memory too
> + Parallel scanning support
> ITHBase: no
> IHBase: yes
> In ITHbase the index table needs to be consulted and then GETs are issued for
> each matching row. The behavior of IHBase (as perceived by the client) is no
> different than a regular scan and hence supports parallel scanning
> seamlessly. parallel GET can be implemented to speedup ITHbase scans
> Why IHbase should outperform ITHBase
> 1. More flexible: a. Supports range queries and multi-index queries b.
> Supports different types - not only byte arrays
> 2. Less overhead: ITHbase pays at least two 'table roundtrips' - one for the
> index table and the other for the main table
> 3. Quicker index expression evaluation: IHBase is using dedicated index data
> structures while ITHbase is using the regular HRegion scan facilities
> Implementation notes
> • Only index Storefiles.Every index scan performs a full memstore scan.
> Indexing the memstore will be implemented only if scanning the memstore will
> prove to be a performance bottleneck
> • Index expression evaluation is performed using bit sets.There are two types
> of bitsets: compressed and expanded. An index will typically store a
> compressed bitset while an expression evaluator will most probably use an
> expanded bitset
> + TODO
> This patch changes some some of hbase core so can instantiate other than
> default HRegion. Fixes bugs in filter too.
> Would like to add this as a contrib. package on 0.20 branch in time for
> 0.20.3 if possible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.