bharathv commented on a change in pull request #1079:
URL: https://github.com/apache/phoenix/pull/1079#discussion_r560549256



##########
File path: 
phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
##########
@@ -88,19 +90,34 @@ public TableSnapshotResultIterator(Configuration 
configuration, Scan scan, ScanM
   }
 
   private void init() throws IOException {
-    RestoreSnapshotHelper.RestoreMetaChanges meta =
-        RestoreSnapshotHelper.copySnapshotForScanner(this.configuration, 
this.fs,
-            this.rootDir, this.restoreDir, this.snapshotName);
-    List<RegionInfo> restoredRegions = meta.getRegionsToAdd();
-    this.htd = meta.getTableDescriptor();
-    this.regions = new ArrayList<RegionInfo>(restoredRegions.size());
-
-    for (RegionInfo restoredRegion : restoredRegions) {
-      if (isValidRegion(restoredRegion)) {
-        this.regions.add(restoredRegion);
+    if 
(PhoenixConfigurationUtil.getMRSnapshotManagedInternally(configuration)) {
+      RestoreSnapshotHelper.RestoreMetaChanges meta =
+          RestoreSnapshotHelper.copySnapshotForScanner(this.configuration, 
this.fs, this.rootDir,
+                      this.restoreDir, this.snapshotName);
+      List<RegionInfo> restoredRegions = meta.getRegionsToAdd();
+      this.htd = meta.getTableDescriptor();
+      this.regions = new ArrayList<>(restoredRegions.size());
+      for (RegionInfo restoredRegion : restoredRegions) {
+        if (isValidRegion(restoredRegion)) {
+          this.regions.add(restoredRegion);
+        }
+      }
+    } else {
+      Path snapshotDir = 
SnapshotDescriptionUtils.getCompletedSnapshotDir(snapshotName, rootDir);

Review comment:
       Quickly skimmed through the MR javadocs, it looks like the cleanup can 
be done at a Job level. If the client/Job conf passes a restoreDir (like you 
are doing in this patch), the job I believe can cleanup the contents of this 
restoreDir at the end of the run (success/failure)? 
   
   As long as we have these semantics established, we can extract the snapshot 
in the driver (once at setup)..and make it default? 
   
   That is to some extent not backward compatible as the clients need to clean 
up the directory manually but I'd argue that the current approach is a serious 
bug and its worth breaking that compatibility IMO. 
   
   I don't have a binding +1 here anyway but just suggesting an approach. 
@gjacoby126 thoughts?

##########
File path: 
phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
##########
@@ -88,19 +90,34 @@ public TableSnapshotResultIterator(Configuration 
configuration, Scan scan, ScanM
   }
 
   private void init() throws IOException {
-    RestoreSnapshotHelper.RestoreMetaChanges meta =
-        RestoreSnapshotHelper.copySnapshotForScanner(this.configuration, 
this.fs,
-            this.rootDir, this.restoreDir, this.snapshotName);
-    List<RegionInfo> restoredRegions = meta.getRegionsToAdd();
-    this.htd = meta.getTableDescriptor();
-    this.regions = new ArrayList<RegionInfo>(restoredRegions.size());
-
-    for (RegionInfo restoredRegion : restoredRegions) {
-      if (isValidRegion(restoredRegion)) {
-        this.regions.add(restoredRegion);
+    if 
(PhoenixConfigurationUtil.getMRSnapshotManagedInternally(configuration)) {

Review comment:
       IMO the original flow is wrong and can lead to pretty bad outcomes (like 
a file blow up), so we should get rid of restoring per task. If you still want 
to use this SnapshotManagedInternally() configuration, I'd suggest move it to 
once at setup and not once per scan.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to