[GitHub] [hbase] apurtell opened a new pull request #3802: HBASE-26405 IntegrationTestLoadSmallValues

GitBox Wed, 27 Oct 2021 10:19:48 -0700


apurtell opened a new pull request #3802:
URL: https://github.com/apache/hbase/pull/3802



   This integration test emulates a use case that stores a lot of small values 
into a table that would likely be heavily indexed (ROW_INDEX_V1, small blocks, 
etc.), an application that crowdsources weather (temperature) observation data. 
This IT can be used to test and optimize compression settings for such cases. 
It comes with a companion utility, HFileBlockExtracter, which extracts block 
data from HFiles into a set of local files for use in training external 
compression dictionaries, perhaps with ZStandard's `zstd` utility.
   
   Run like:
   
       ./bin/hbase org.apache.hadoop.hbase.test.IntegrationTestLoadSmallValues
          numRows numMappers outputDir
   
   You can also split the Loader and Verify stages:
   
   Load with:
   
       ./bin/hbase 
'org.apache.hadoop.hbase.test.IntegrationTestLoadSmallValues$Loader'
          numRows numMappers outputDir
   
   Verify with:
   
       ./bin/hbase 
'org.apache.hadoop.hbase.test.IntegrationTestLoadSmallValues$Verify'
          outputDir
   
   Use HFileExtractor like so:
   
       ./bin/hbase org.apache.hadoop.hbase.test.util.HFileExtractor
          options outputDir hfile_1 ... hfile_n
   
   Where options are:
   
        -d : Width of generated file name for zero padding, default: 5 
        -n : Total number of blocks to extract, default: unlimited 
        -r | --random: Shuffle blocks and write them in randomized order
   
   You might train ZStandard dictionaries on the extracted block files like so:
   (Assumes outputDir given to HFileExtractor was 't'.)
   
       $ zstd --train -o dict t/*
   
   Or:
   
       $ zstd --train-fastcover=k=32,d=6 -o dict t/*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] apurtell opened a new pull request #3802: HBASE-26405 IntegrationTestLoadSmallValues

Reply via email to