[
https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063920#comment-16063920
]
Hadoop QA commented on HBASE-18161:
-----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 1m 3s {color}
| {color:red} HBASE-18161 does not apply to master. Rebase required? Wrong
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12874574/MultiHFileOutputFormatSupport_HBASE_18161_v11.patch
|
| JIRA Issue | HBASE-18161 |
| Console output |
https://builds.apache.org/job/PreCommit-HBASE-Build/7342/console |
| Powered by | Apache Yetus 0.3.0 http://yetus.apache.org |
This message was automatically generated.
> Incremental Load support for Multiple-Table HFileOutputFormat
> -------------------------------------------------------------
>
> Key: HBASE-18161
> URL: https://issues.apache.org/jira/browse/HBASE-18161
> Project: HBase
> Issue Type: New Feature
> Reporter: Densel Santhmayor
> Priority: Minor
> Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch,
> MultiHFileOutputFormatSupport_HBASE_18161_v10.patch,
> MultiHFileOutputFormatSupport_HBASE_18161_v11.patch,
> MultiHFileOutputFormatSupport_HBASE_18161_v11.patch,
> MultiHFileOutputFormatSupport_HBASE_18161_v2.patch,
> MultiHFileOutputFormatSupport_HBASE_18161_v3.patch,
> MultiHFileOutputFormatSupport_HBASE_18161_v4.patch,
> MultiHFileOutputFormatSupport_HBASE_18161_v5.patch,
> MultiHFileOutputFormatSupport_HBASE_18161_v6.patch,
> MultiHFileOutputFormatSupport_HBASE_18161_v7.patch,
> MultiHFileOutputFormatSupport_HBASE_18161_v8.patch,
> MultiHFileOutputFormatSupport_HBASE_18161_v9.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to
> HFiles for a single table. The file(s) can then be uploaded to the relevant
> RegionServers information with reasonable latency. This feature is useful to
> make a large set of data available for queries at the same time as well as
> provides a way to efficiently process very large input into HBase without
> affecting query latencies.
> There is, however, no support to write variations of the same record key to
> HFiles belonging to multiple HBase tables from within the same MapReduce job.
>
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to
> HFiles for different tables within the same MapReduce job while single-table
> HFile features backwards-compatible.
> For our use case, we needed to write a record key to a smaller HBase table
> for quicker access, and the same record key with a date appended to a larger
> table for longer term storage with chronological access. Each of these tables
> would have different TTL and other settings to support their respective
> access patterns. We also needed to be able to bulk write records to multiple
> tables with different subsets of very large input as efficiently as possible.
> Rather than run the MapReduce job multiple times (one for each table or
> record structure), it would be useful to be able to parse the input a single
> time and write to multiple tables simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing
> heavily-used HFileOutputFormat2 interface to allow benefits such as locality
> sensitivity (that was introduced long after we implemented support for
> multiple tables) to support both single table and multi table hfile writes.
> h2. Proposal
> * Backwards compatibility for existing single table support in
> HFileOutputFormat2 will be maintained and in this case, mappers will need to
> emit the table rowkey as before. However, a new class -
> MultiHFileOutputFormat - will provide a helper function to generate a rowkey
> for mappers that prefixes the desired tablename to the existing rowkey as
> well as provides configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and
> region locator pairs, analogous to the single pair currently accepted by
> HFileOutputFormat2.
> ** Compression, Block Size, Bloom Type and Datablock settings PER column
> family that are set in the Configuration object are now indexed and retrieved
> by tablename AND column family
> ** getRegionStartKeys will now support multiple regionlocators and calculate
> split points and therefore partitions collectively for all tables. Similarly,
> now the eventual number of Reducers will be equal to the total number of
> partitions across all tables.
> ** The RecordWriter class will be able to process rowkeys either with or
> without the tablename prepended depending on how configureIncrementalLoad was
> configured with MultiHFileOutputFormat or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which
> will match the output format of HFileOutputFormat2. However, while the
> default use case will keep the existing directory structure with column
> family name as the directory and HFiles within that directory, in the case of
> MultiHFileOutputFormat, it will output HFiles in the output directory with
> the following relative paths:
> {noformat}
> --table1
> --family1
> --HFiles
> --table2
> --family1
> --family2
> --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - HBASE-3727
> and HBASE-16261. Thanks to [~clayb] for his support. This is a contribution
> from Bloomberg developers.
> The patch will be attached shortly.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)