[ https://issues.apache.org/jira/browse/HBASE-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Duo Zhang resolved HBASE-27733. ------------------------------- Hadoop Flags: Reviewed Resolution: Fixed No activity for a long time, resolve. Please open new issue for backporting if you are still around [~alanlemma]. Thanks. > hfile split occurs during bulkload, the new HFile file does not specify > favored nodes > ------------------------------------------------------------------------------------- > > Key: HBASE-27733 > URL: https://issues.apache.org/jira/browse/HBASE-27733 > Project: HBase > Issue Type: Improvement > Components: tooling > Reporter: alan.zhao > Assignee: alan.zhao > Priority: Major > Fix For: 3.0.0-alpha-4 > > > ## BulkloadHFilesTool.class > /** > * Copy half of an HFile into a new HFile. > */ > private static void copyHFileHalf(Configuration conf, Path inFile, Path > outFile, > Reference reference, ColumnFamilyDescriptor familyDescriptor) throws > IOException { > FileSystem fs = inFile.getFileSystem(conf); > CacheConfig cacheConf = CacheConfig.DISABLED; > HalfStoreFileReader halfReader = null; > StoreFileWriter halfWriter = null; > try { > ReaderContext context = new ReaderContextBuilder().withFileSystemAndPath(fs, > inFile).build(); > StoreFileInfo storeFileInfo = > new StoreFileInfo(conf, fs, fs.getFileStatus(inFile), reference); > storeFileInfo.initHFileInfo(context); > halfReader = (HalfStoreFileReader) storeFileInfo.createReader(context, > cacheConf); > storeFileInfo.getHFileInfo().initMetaAndIndex(halfReader.getHFileReader()); > Map<byte[], byte[]> fileInfo = halfReader.loadFileInfo(); > int blocksize = familyDescriptor.getBlocksize(); > Algorithm compression = familyDescriptor.getCompressionType(); > BloomType bloomFilterType = familyDescriptor.getBloomFilterType(); > HFileContext hFileContext = new > HFileContextBuilder().withCompression(compression) > .withChecksumType(StoreUtils.getChecksumType(conf)) > .withBytesPerCheckSum(StoreUtils.getBytesPerChecksum(conf)).withBlockSize(blocksize) > .withDataBlockEncoding(familyDescriptor.getDataBlockEncoding()).withIncludesTags(true) > .withCreateTime(EnvironmentEdgeManager.currentTime()).build(); > *halfWriter = new StoreFileWriter.Builder(conf, cacheConf, > fs).withFilePath(outFile)* > *.withBloomType(bloomFilterType).withFileContext(hFileContext).build();* > HFileScanner scanner = halfReader.getScanner(false, false, false); > scanner.seekTo(); > ... > > When hfile splitting occurs during bulkload, the new HFile file does not > specify favored nodes, which will affect the locality of data. Internally, we > implemented a version of the code that allows us to specify the favored nodes > of the split HFile in copyHFileHalf() to avoid compromising locality -- This message was sent by Atlassian Jira (v8.20.10#820010)