[
https://issues.apache.org/jira/browse/HBASE-27904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Himanshu Gwalani updated HBASE-27904:
-------------------------------------
Description:
As of now, there is no data generator tool in HBase leveraging bulk load. Since
bulk load skips client writes path, it's much faster to generate data and use
of for load/performance tests where client writes are not a mandate.
{*}Example{*}: Any tooling over HBase that need x TBs of HBase Table for load
testing.
{*}Requirements{*}:
1. Tooling should support pre-split tables (number of splits to be taken as
input).
2. Data should be UNIFORMLY distributed across all regions of the table.
*High-level Steps*
1. Generate HFiles with random data (using custom Mapper and
[HFileOutputFormat2|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java])
2. Bulk load those HFiles to the respective regions of the table using
[LoadIncrementalFiles|https://hbase.apache.org/2.2/devapidocs/org/apache/hadoop/hbase/tool/LoadIncrementalHFiles.html]
was:
As of now, there is no data generator tool in HBase leveraging bulk load. Since
bulk load skips client writes path, it's much faster to generate data and use
of for load/performance tests where client writes are not a mandate.
Example: Any tooling over HBase that need x TBs of HBase Table for load testing.
The tool will generate data as a two-step process:
1. Generate HFiles with random data (using custom Mapper and
[HFileOutputFormat2|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java])
2. Bulk load those HFiles to the respective regions of the table using
[LoadIncrementalFiles|https://hbase.apache.org/2.2/devapidocs/org/apache/hadoop/hbase/tool/LoadIncrementalHFiles.html]
> A random data generator tool leveraging bulk load.
> --------------------------------------------------
>
> Key: HBASE-27904
> URL: https://issues.apache.org/jira/browse/HBASE-27904
> Project: HBase
> Issue Type: New Feature
> Components: util
> Reporter: Himanshu Gwalani
> Priority: Minor
>
> As of now, there is no data generator tool in HBase leveraging bulk load.
> Since bulk load skips client writes path, it's much faster to generate data
> and use of for load/performance tests where client writes are not a mandate.
> {*}Example{*}: Any tooling over HBase that need x TBs of HBase Table for load
> testing.
> {*}Requirements{*}:
> 1. Tooling should support pre-split tables (number of splits to be taken as
> input).
> 2. Data should be UNIFORMLY distributed across all regions of the table.
> *High-level Steps*
> 1. Generate HFiles with random data (using custom Mapper and
> [HFileOutputFormat2|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java])
> 2. Bulk load those HFiles to the respective regions of the table using
> [LoadIncrementalFiles|https://hbase.apache.org/2.2/devapidocs/org/apache/hadoop/hbase/tool/LoadIncrementalHFiles.html]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)