[ 
https://issues.apache.org/jira/browse/HBASE-27904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Gwalani updated HBASE-27904:
-------------------------------------
    Description: 
As of now, there is no data generator tool in HBase leveraging bulk load. Since 
bulk load skips client writes path, it's much faster to generate data and use 
of for load/performance tests where client writes are not a mandate.
Example: Any tooling over HBase that need x TBs of HBase Table for load testing.

The tool will generate data as a two-step process:
1. Generate HFiles with random data (using custom Mapper and 
[HFileOutputFormat2|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java])
2. Bulk load those HFiles to the respective regions of the table using 
[LoadIncrementalFiles|https://hbase.apache.org/2.2/devapidocs/org/apache/hadoop/hbase/tool/LoadIncrementalHFiles.html]

  was:
As of now, there is no data generator tool in HBase leveraging bulk load. Since 
bulk load skips client writes path and if an tooling over HBase need huge amout 
of data for load/performance testing, bulk load can be leveraged.

The tool will generate data as a two-step process:
1. Generate HFiles with random data (using custom Mapper and 
[HFileOutputFormat2|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java])
2. Bulk load those HFiles to the respective regions of the table using 
[LoadIncrementalFiles|https://hbase.apache.org/2.2/devapidocs/org/apache/hadoop/hbase/tool/LoadIncrementalHFiles.html]


> A random data generator tool leveraging bulk load.
> --------------------------------------------------
>
>                 Key: HBASE-27904
>                 URL: https://issues.apache.org/jira/browse/HBASE-27904
>             Project: HBase
>          Issue Type: New Feature
>          Components: util
>            Reporter: Himanshu Gwalani
>            Priority: Minor
>
> As of now, there is no data generator tool in HBase leveraging bulk load. 
> Since bulk load skips client writes path, it's much faster to generate data 
> and use of for load/performance tests where client writes are not a mandate.
> Example: Any tooling over HBase that need x TBs of HBase Table for load 
> testing.
> The tool will generate data as a two-step process:
> 1. Generate HFiles with random data (using custom Mapper and 
> [HFileOutputFormat2|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java])
> 2. Bulk load those HFiles to the respective regions of the table using 
> [LoadIncrementalFiles|https://hbase.apache.org/2.2/devapidocs/org/apache/hadoop/hbase/tool/LoadIncrementalHFiles.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to