apurtell commented on code in PR #5939:
URL: https://github.com/apache/hbase/pull/5939#discussion_r1759292740
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java:
##########
@@ -740,8 +742,9 @@ private Pair<List<Path>, List<Path>> splitStoreFiles(final
MasterProcedureEnv en
// is running in a regionserver's Store context, or we might not be
able
// to read the hfiles.
storeFileInfo.setConf(storeConfiguration);
- StoreFileSplitter sfs = new StoreFileSplitter(regionFs, familyName,
- new HStoreFile(storeFileInfo, hcd.getBloomFilterType(),
CacheConfig.DISABLED));
+ StoreFileSplitter sfs =
Review Comment:
Nit: Assign the value of .getSecond() to a local variable of type
`StoreFileTracker` and use that variable instead so it is easier to follow what
is going on here.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java:
##########
@@ -726,10 +727,11 @@ private Pair<List<Path>, List<Path>>
splitStoreFiles(final MasterProcedureEnv en
final List<Future<Pair<Path, Path>>> futures = new
ArrayList<Future<Pair<Path, Path>>>(nbFiles);
// Split each store file.
- for (Map.Entry<String, Collection<StoreFileInfo>> e : files.entrySet()) {
+ for (Map.Entry<String, Pair<Collection<StoreFileInfo>, StoreFileTracker>>
e : files
+ .entrySet()) {
byte[] familyName = Bytes.toBytes(e.getKey());
final ColumnFamilyDescriptor hcd = htd.getColumnFamily(familyName);
- final Collection<StoreFileInfo> storeFiles = e.getValue();
+ final Collection<StoreFileInfo> storeFiles = e.getValue().getFirst();
Review Comment:
Nit: Assign the value of .getFirst() to a local variable of type
`Collection<StoreFileInfo>` and use that variable instead so it is easier to
follow what is going on here.
##########
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionFileSystem.java:
##########
@@ -352,33 +351,36 @@ public void setWorkingDirectory(Path arg0) {
}
@Test
+ // TODO: move this to test layer of SFT interface
Review Comment:
What is the plan for this? To do for this issue, or a follow up JIRA? Is it
filed yet or will it be? etc.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionFileSystem.java:
##########
@@ -351,18 +306,19 @@ Path getStoreFilePath(final String familyName, final
String fileName) {
* @param fileName File Name
* @return The {@link StoreFileInfo} for the specified family/file
*/
- StoreFileInfo getStoreFileInfo(final String familyName, final String
fileName)
- throws IOException {
+ StoreFileInfo getStoreFileInfo(final String familyName, final String
fileName,
+ final StoreFileTracker tracker) throws IOException {
Path familyDir = getStoreDir(familyName);
return ServerRegionReplicaUtil.getStoreFileInfo(conf, fs, regionInfo,
regionInfoForFs,
- familyName, new Path(familyDir, fileName));
+ familyName, new Path(familyDir, fileName), tracker);
}
/**
* Returns true if the specified family has reference files
* @param familyName Column Family Name
* @return true if family contains reference files
*/
+ // TODO: move this to SFT
Review Comment:
What is the plan here, are you thinking?
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/storefiletracker/FileBasedStoreFileTracker.java:
##########
@@ -77,7 +77,7 @@ protected List<StoreFileInfo> doLoadStoreFiles(boolean
readOnly) throws IOExcept
for (StoreFileEntry entry : list.getStoreFileList()) {
infos.add(ServerRegionReplicaUtil.getStoreFileInfo(conf, fs,
ctx.getRegionInfo(),
ctx.getRegionFileSystem().getRegionInfoForFS(),
ctx.getFamily().getNameAsString(),
- new Path(ctx.getFamilyStoreDirectoryPath(), entry.getName())));
+ new Path(ctx.getFamilyStoreDirectoryPath(), entry.getName()), this));
Review Comment:
A question about `FileBasedStoreFileTracker` in general.
So the plan is to refactor the reference file handling into
`StoreFileTrackerBase`, and simply inherit it here for now and otherwise do not
change it; and then on some later JIRA maybe evolve the representation and
handling references with overrides in `FileBasedStoreFileTracker` somehow.
Correct? That's why we are moving more into SFT? :-)
Consider writing up a couple of sentences about the motivation of this work
and the long term directions and putting that somewhere near the top of the PR
description?
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/mob/MobFile.java:
##########
@@ -134,11 +135,13 @@ public void close() throws IOException {
* @param cacheConf The CacheConfig.
* @return An instance of the MobFile.
*/
- public static MobFile create(FileSystem fs, Path path, Configuration conf,
CacheConfig cacheConf)
- throws IOException {
- // XXX: primaryReplica is only used for constructing the key of block
cache so it is not a
- // critical problem if we pass the wrong value, so here we always pass
true. Need to fix later.
- HStoreFile sf = new HStoreFile(fs, path, conf, cacheConf, BloomType.NONE,
true);
+ public static MobFile create(FileSystem fs, Path path, Configuration conf,
CacheConfig cacheConf,
+ StoreFileTracker sft) throws IOException {
+ // XXX: primaryReplica is only used for constructing the key of block
cache so
Review Comment:
Fix formatting
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/storefiletracker/DefaultStoreFileTracker.java:
##########
@@ -37,6 +45,8 @@ public DefaultStoreFileTracker(Configuration conf, boolean
isPrimaryReplica, Sto
super(conf, isPrimaryReplica, ctx);
}
+ private static final Logger LOG =
LoggerFactory.getLogger(DefaultStoreFileTracker.class);
Review Comment:
Now that we have a Logger, and already there is at least one trace level
log, what do you think about adding trace level logging for all the SFT
activities? That might help with debugging issues in the future?
This comment applies for all SFT impls.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java:
##########
@@ -1347,7 +1349,9 @@ public static HDFSBlocksDistribution
computeHDFSBlocksDistribution(Configuration
if (StoreFileInfo.isReference(p) || HFileLink.isHFileLink(p)) {
// Only construct StoreFileInfo object if its not a hfile, save obj
// creation
- StoreFileInfo storeFileInfo = new StoreFileInfo(conf, fs, status);
+ StoreFileTracker sft =
+ StoreFileTrackerFactory.create(conf, tableDescriptor, family,
regionFs);
+ StoreFileInfo storeFileInfo = sft.getStoreFileInfo(status,
status.getPath(), false);
Review Comment:
Every time we create a StoreFileTracker object it will have no state, and so
it will either need to go to the filesystem and list the directory or read the
tracker file depending on which type it is in order to initialize as soon as we
try to use it.
It's fine... because the original code causes IO to happen also.. however,
What do you think about the possibility of reuse? This is a more general
question than a comment about this particular call site. Should the
StoreFileTrackerFactory cache instances and return the cached instances that
match the arguments to StoreFileTrackerFactory.create() rather than make a new
instance? Can StoreFileTracker instances be made thread safe so they can be
cached and shared?
If we have reuse, and all the relevant filesystem ops go through the
StoreFileTracker, then we could potentially save a lot of filesystem or object
store IO, because a reused StoreFileTracker would have the ground truth already
and would not need to go to the filesystem or object store and do IO in order
to e.g. return the StoreFileInfo of a given path.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/mob/CachedMobFile.java:
##########
@@ -41,10 +42,12 @@ public CachedMobFile(HStoreFile sf) {
}
public static CachedMobFile create(FileSystem fs, Path path, Configuration
conf,
- CacheConfig cacheConf) throws IOException {
- // XXX: primaryReplica is only used for constructing the key of block
cache so it is not a
- // critical problem if we pass the wrong value, so here we always pass
true. Need to fix later.
- HStoreFile sf = new HStoreFile(fs, path, conf, cacheConf, BloomType.NONE,
true);
+ CacheConfig cacheConf, StoreFileTracker sft) throws IOException {
+ // XXX: primaryReplica is only used for constructing the key of block
cache so
Review Comment:
Fix formatting
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotManifest.java:
##########
@@ -341,13 +342,18 @@ protected void addRegion(Path tableDir, RegionInfo
regionInfo, RegionVisitor vis
}
}
- private List<StoreFileInfo> getStoreFiles(Path storeDir) throws IOException {
- FileStatus[] stats = CommonFSUtils.listStatus(rootFs, storeDir);
+ private List<StoreFileInfo> getStoreFiles(Path storePath, TableDescriptor
htd,
Review Comment:
Yeah I remember from my earlier work in this area that we need to plumb
`TableDescriptor` through all of these places now. :-/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]