[ 
https://issues.apache.org/jira/browse/HBASE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-27733.
-------------------------------
    Hadoop Flags: Reviewed
      Resolution: Fixed

No activity for a long time, resolve.

Please open new issue for backporting if you are still around [~alanlemma].

Thanks.

> hfile split occurs during bulkload, the new HFile file does not specify 
> favored nodes
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-27733
>                 URL: https://issues.apache.org/jira/browse/HBASE-27733
>             Project: HBase
>          Issue Type: Improvement
>          Components: tooling
>            Reporter: alan.zhao
>            Assignee: alan.zhao
>            Priority: Major
>             Fix For: 3.0.0-alpha-4
>
>
> ## BulkloadHFilesTool.class
> /**
> * Copy half of an HFile into a new HFile.
> */
> private static void copyHFileHalf(Configuration conf, Path inFile, Path 
> outFile,
> Reference reference, ColumnFamilyDescriptor familyDescriptor) throws 
> IOException {
> FileSystem fs = inFile.getFileSystem(conf);
> CacheConfig cacheConf = CacheConfig.DISABLED;
> HalfStoreFileReader halfReader = null;
> StoreFileWriter halfWriter = null;
> try {
> ReaderContext context = new ReaderContextBuilder().withFileSystemAndPath(fs, 
> inFile).build();
> StoreFileInfo storeFileInfo =
> new StoreFileInfo(conf, fs, fs.getFileStatus(inFile), reference);
> storeFileInfo.initHFileInfo(context);
> halfReader = (HalfStoreFileReader) storeFileInfo.createReader(context, 
> cacheConf);
> storeFileInfo.getHFileInfo().initMetaAndIndex(halfReader.getHFileReader());
> Map<byte[], byte[]> fileInfo = halfReader.loadFileInfo();
> int blocksize = familyDescriptor.getBlocksize();
> Algorithm compression = familyDescriptor.getCompressionType();
> BloomType bloomFilterType = familyDescriptor.getBloomFilterType();
> HFileContext hFileContext = new 
> HFileContextBuilder().withCompression(compression)
> .withChecksumType(StoreUtils.getChecksumType(conf))
> .withBytesPerCheckSum(StoreUtils.getBytesPerChecksum(conf)).withBlockSize(blocksize)
> .withDataBlockEncoding(familyDescriptor.getDataBlockEncoding()).withIncludesTags(true)
> .withCreateTime(EnvironmentEdgeManager.currentTime()).build();
> *halfWriter = new StoreFileWriter.Builder(conf, cacheConf, 
> fs).withFilePath(outFile)*
>  *.withBloomType(bloomFilterType).withFileContext(hFileContext).build();*
> HFileScanner scanner = halfReader.getScanner(false, false, false);
> scanner.seekTo();
> ...
>  
> When hfile splitting occurs during bulkload, the new HFile file does not 
> specify favored nodes, which will affect the locality of data. Internally, we 
> implemented a version of the code that allows us to specify the favored nodes 
> of the split HFile in copyHFileHalf() to avoid compromising locality



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to