apurtell opened a new pull request #3802:
URL: https://github.com/apache/hbase/pull/3802
This integration test emulates a use case that stores a lot of small values
into a table that would likely be heavily indexed (ROW_INDEX_V1, small blocks,
etc.), an application that crowdsources weather (temperature) observation data.
This IT can be used to test and optimize compression settings for such cases.
It comes with a companion utility, HFileBlockExtracter, which extracts block
data from HFiles into a set of local files for use in training external
compression dictionaries, perhaps with ZStandard's `zstd` utility.
Run like:
./bin/hbase org.apache.hadoop.hbase.test.IntegrationTestLoadSmallValues
numRows numMappers outputDir
You can also split the Loader and Verify stages:
Load with:
./bin/hbase
'org.apache.hadoop.hbase.test.IntegrationTestLoadSmallValues$Loader'
numRows numMappers outputDir
Verify with:
./bin/hbase
'org.apache.hadoop.hbase.test.IntegrationTestLoadSmallValues$Verify'
outputDir
Use HFileExtractor like so:
./bin/hbase org.apache.hadoop.hbase.test.util.HFileExtractor
options outputDir hfile_1 ... hfile_n
Where options are:
-d : Width of generated file name for zero padding, default: 5
-n : Total number of blocks to extract, default: unlimited
-r | --random: Shuffle blocks and write them in randomized order
You might train ZStandard dictionaries on the extracted block files like so:
(Assumes outputDir given to HFileExtractor was 't'.)
$ zstd --train -o dict t/*
Or:
$ zstd --train-fastcover=k=32,d=6 -o dict t/*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]