[ https://issues.apache.org/jira/browse/HBASE-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani resolved HBASE-28836. ---------------------------------- Fix Version/s: 2.7.0 3.0.0-beta-2 2.5.11 2.6.2 Hadoop Flags: Reviewed Resolution: Fixed > Parallelize the archival of compacted files > -------------------------------------------- > > Key: HBASE-28836 > URL: https://issues.apache.org/jira/browse/HBASE-28836 > Project: HBase > Issue Type: Improvement > Components: regionserver > Affects Versions: 2.5.10 > Reporter: Aman Poonia > Assignee: Aman Poonia > Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.5.11, 2.6.2 > > > While splitting a region in hbase it has to cleanup compacted files for > bookkeeping. > > Currently we do it sequentially and that is good enough because for hdfs as > it is a fast operation. When we do the same in s3 it becomes a issue. We need > to paralleize this to make it faster. > {code:java} > // code placeholder > for (File file : toArchive) { > // if its a file archive it > try { > LOG.trace("Archiving {}", file); > if (file.isFile()) { > // attempt to archive the file > if (!resolveAndArchiveFile(baseArchiveDir, file, startTime)) { > LOG.warn("Couldn't archive " + file + " into backup directory: " > + baseArchiveDir); > failures.add(file); > } > } else { > // otherwise its a directory and we need to archive all files > LOG.trace("{} is a directory, archiving children files", file); > // so we add the directory name to the one base archive > Path parentArchiveDir = new Path(baseArchiveDir, file.getName()); > // and then get all the files from that directory and attempt to > // archive those too > Collection<File> children = file.getChildren(); > failures.addAll(resolveAndArchive(fs, parentArchiveDir, children, > start)); > } > } catch (IOException e) { > LOG.warn("Failed to archive {}", file, e); > failures.add(file); > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)