bharathv commented on a change in pull request #1079:
URL: https://github.com/apache/phoenix/pull/1079#discussion_r560549256
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
##########
@@ -88,19 +90,34 @@ public TableSnapshotResultIterator(Configuration
configuration, Scan scan, ScanM
}
private void init() throws IOException {
- RestoreSnapshotHelper.RestoreMetaChanges meta =
- RestoreSnapshotHelper.copySnapshotForScanner(this.configuration,
this.fs,
- this.rootDir, this.restoreDir, this.snapshotName);
- List<RegionInfo> restoredRegions = meta.getRegionsToAdd();
- this.htd = meta.getTableDescriptor();
- this.regions = new ArrayList<RegionInfo>(restoredRegions.size());
-
- for (RegionInfo restoredRegion : restoredRegions) {
- if (isValidRegion(restoredRegion)) {
- this.regions.add(restoredRegion);
+ if
(PhoenixConfigurationUtil.getMRSnapshotManagedInternally(configuration)) {
+ RestoreSnapshotHelper.RestoreMetaChanges meta =
+ RestoreSnapshotHelper.copySnapshotForScanner(this.configuration,
this.fs, this.rootDir,
+ this.restoreDir, this.snapshotName);
+ List<RegionInfo> restoredRegions = meta.getRegionsToAdd();
+ this.htd = meta.getTableDescriptor();
+ this.regions = new ArrayList<>(restoredRegions.size());
+ for (RegionInfo restoredRegion : restoredRegions) {
+ if (isValidRegion(restoredRegion)) {
+ this.regions.add(restoredRegion);
+ }
+ }
+ } else {
+ Path snapshotDir =
SnapshotDescriptionUtils.getCompletedSnapshotDir(snapshotName, rootDir);
Review comment:
Quickly skimmed through the MR javadocs, it looks like the cleanup can
be done at a Job level. If the client/Job conf passes a restoreDir (like you
are doing in this patch), the job I believe can cleanup the contents of this
restoreDir at the end of the run (success/failure)?
As long as we have these semantics established, we can extract the snapshot
in the driver (once at setup)..and make it default?
That is to some extent not backward compatible as the clients need to clean
up the directory manually but I'd argue that the current approach is a serious
bug and its worth breaking that compatibility IMO.
I don't have a binding +1 here anyway but just suggesting an approach.
@gjacoby126 thoughts?
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
##########
@@ -88,19 +90,34 @@ public TableSnapshotResultIterator(Configuration
configuration, Scan scan, ScanM
}
private void init() throws IOException {
- RestoreSnapshotHelper.RestoreMetaChanges meta =
- RestoreSnapshotHelper.copySnapshotForScanner(this.configuration,
this.fs,
- this.rootDir, this.restoreDir, this.snapshotName);
- List<RegionInfo> restoredRegions = meta.getRegionsToAdd();
- this.htd = meta.getTableDescriptor();
- this.regions = new ArrayList<RegionInfo>(restoredRegions.size());
-
- for (RegionInfo restoredRegion : restoredRegions) {
- if (isValidRegion(restoredRegion)) {
- this.regions.add(restoredRegion);
+ if
(PhoenixConfigurationUtil.getMRSnapshotManagedInternally(configuration)) {
Review comment:
IMO the original flow is wrong and can lead to pretty bad outcomes (like
a file blow up), so we should get rid of restoring per task. If you still want
to use this SnapshotManagedInternally() configuration, I'd suggest move it to
once at setup and not once per scan.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]