alan.zhao created HBASE-27733:
---------------------------------
Summary: hfile split occurs during bulkload, the new HFile file
does not specify favored nodes
Key: HBASE-27733
URL: https://issues.apache.org/jira/browse/HBASE-27733
Project: HBase
Issue Type: Improvement
Reporter: alan.zhao
Assignee: alan.zhao
## BulkloadHFilesTool.class
/**
* Copy half of an HFile into a new HFile.
*/
private static void copyHFileHalf(Configuration conf, Path inFile, Path outFile,
Reference reference, ColumnFamilyDescriptor familyDescriptor) throws
IOException {
FileSystem fs = inFile.getFileSystem(conf);
CacheConfig cacheConf = CacheConfig.DISABLED;
HalfStoreFileReader halfReader = null;
StoreFileWriter halfWriter = null;
try {
ReaderContext context = new ReaderContextBuilder().withFileSystemAndPath(fs,
inFile).build();
StoreFileInfo storeFileInfo =
new StoreFileInfo(conf, fs, fs.getFileStatus(inFile), reference);
storeFileInfo.initHFileInfo(context);
halfReader = (HalfStoreFileReader) storeFileInfo.createReader(context,
cacheConf);
storeFileInfo.getHFileInfo().initMetaAndIndex(halfReader.getHFileReader());
Map<byte[], byte[]> fileInfo = halfReader.loadFileInfo();
int blocksize = familyDescriptor.getBlocksize();
Algorithm compression = familyDescriptor.getCompressionType();
BloomType bloomFilterType = familyDescriptor.getBloomFilterType();
HFileContext hFileContext = new
HFileContextBuilder().withCompression(compression)
.withChecksumType(StoreUtils.getChecksumType(conf))
.withBytesPerCheckSum(StoreUtils.getBytesPerChecksum(conf)).withBlockSize(blocksize)
.withDataBlockEncoding(familyDescriptor.getDataBlockEncoding()).withIncludesTags(true)
.withCreateTime(EnvironmentEdgeManager.currentTime()).build();
*halfWriter = new StoreFileWriter.Builder(conf, cacheConf,
fs).withFilePath(outFile)*
*.withBloomType(bloomFilterType).withFileContext(hFileContext).build();*
HFileScanner scanner = halfReader.getScanner(false, false, false);
scanner.seekTo();
...
When hfile splitting occurs during bulkload, the new HFile file does not
specify favored nodes, which will affect the locality of data. Internally, we
implemented a version of the code that allows us to specify the favored nodes
of the split HFile in copyHFileHalf() to avoid compromising locality
--
This message was sent by Atlassian Jira
(v8.20.10#820010)