gjacoby126 commented on a change in pull request #1079:
URL: https://github.com/apache/phoenix/pull/1079#discussion_r561272750



##########
File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java
##########
@@ -866,4 +872,18 @@ public static void setTenantId(Configuration 
configuration, String tenantId){
         configuration.set(MAPREDUCE_TENANT_ID, tenantId);
     }
 
+    public static void setMRSnapshotManagedInternally(Configuration 
configuration, Boolean isSnapshotRestoreManagedInternally) {
+        Preconditions.checkNotNull(configuration);
+        Preconditions.checkNotNull(isSnapshotRestoreManagedInternally);
+        configuration.set(MANAGE_MR_SNAPSHOT_RESTORE_INTERNALLY,
+                String.valueOf(isSnapshotRestoreManagedInternally));
+    }
+
+    public static boolean getMRSnapshotManagedInternally(final Configuration 
configuration) {

Review comment:
       In addition to switching the setting to Externally, this should probably 
have an "is" prefix, not a get, because it's a boolean. (As you do for the 
local variable)

##########
File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java
##########
@@ -866,4 +872,18 @@ public static void setTenantId(Configuration 
configuration, String tenantId){
         configuration.set(MAPREDUCE_TENANT_ID, tenantId);
     }
 
+    public static void setMRSnapshotManagedInternally(Configuration 
configuration, Boolean isSnapshotRestoreManagedInternally) {

Review comment:
       I think should be setMRSnapshotManagedExternally. That keeps it closer 
to the HBase solution this is modeled after (HBASE-18806) where true means that 
the external job has already restored the snapshot and will clean it up. 

##########
File path: 
phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
##########
@@ -88,19 +90,34 @@ public TableSnapshotResultIterator(Configuration 
configuration, Scan scan, ScanM
   }
 
   private void init() throws IOException {
-    RestoreSnapshotHelper.RestoreMetaChanges meta =
-        RestoreSnapshotHelper.copySnapshotForScanner(this.configuration, 
this.fs,
-            this.rootDir, this.restoreDir, this.snapshotName);
-    List<RegionInfo> restoredRegions = meta.getRegionsToAdd();
-    this.htd = meta.getTableDescriptor();
-    this.regions = new ArrayList<RegionInfo>(restoredRegions.size());
-
-    for (RegionInfo restoredRegion : restoredRegions) {
-      if (isValidRegion(restoredRegion)) {
-        this.regions.add(restoredRegion);
+    if 
(PhoenixConfigurationUtil.getMRSnapshotManagedInternally(configuration)) {
+      RestoreSnapshotHelper.RestoreMetaChanges meta =
+          RestoreSnapshotHelper.copySnapshotForScanner(this.configuration, 
this.fs, this.rootDir,
+                      this.restoreDir, this.snapshotName);
+      List<RegionInfo> restoredRegions = meta.getRegionsToAdd();
+      this.htd = meta.getTableDescriptor();
+      this.regions = new ArrayList<>(restoredRegions.size());
+      for (RegionInfo restoredRegion : restoredRegions) {
+        if (isValidRegion(restoredRegion)) {
+          this.regions.add(restoredRegion);
+        }
+      }
+    } else {
+      Path snapshotDir = 
SnapshotDescriptionUtils.getCompletedSnapshotDir(snapshotName, rootDir);

Review comment:
       @bharathv - I agree with your approach as the correct way forward, but 
since we're very close to a release think that work can be deferred to a future 
JIRA. The ability to turn off any automated snapshot handling is an easy fix to 
the immediate problem, and still useful even after we fix the automation 
problems later (for example if two jobs are part of a larger process and want 
to share a restored snapshot.) What do you think? 
   
   @sakshamgangwar - if we do go with the current approach, we should probably 
change the name of the JIRA and commit to something more descriptive of what it 
actually does. 

##########
File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java
##########
@@ -866,4 +872,18 @@ public static void setTenantId(Configuration 
configuration, String tenantId){
         configuration.set(MAPREDUCE_TENANT_ID, tenantId);
     }
 
+    public static void setMRSnapshotManagedInternally(Configuration 
configuration, Boolean isSnapshotRestoreManagedInternally) {

Review comment:
       I think should be setMRSnapshotManagedExternally. That keeps it closer 
to the HBase solution this is modeled after (HBASE-18806) where true means that 
the external process requesting the MR job has already restored the snapshot 
and will clean it up. 

##########
File path: 
phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
##########
@@ -78,8 +80,7 @@ public TableSnapshotResultIterator(Configuration 
configuration, Scan scan, ScanM
     this.scan = scan;
     this.scanMetricsHolder = scanMetricsHolder;
     this.scanIterator = UNINITIALIZED_SCANNER;
-    this.restoreDir = new 
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY),
-        UUID.randomUUID().toString());
+    this.restoreDir = new 
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY));

Review comment:
       @sakshamgangwar - Remember that scans are taking place in parallel as 
multiple mappers may be running at the same time. Until we fix the automation 
to only create once in a future JIRA, don't we need to keep the restored files 
in separate directories so tasks won't step on each other?

##########
File path: 
phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
##########
@@ -80,7 +81,12 @@ public TableSnapshotResultIterator(Configuration 
configuration, Scan scan, ScanM
     this.scan = scan;
     this.scanMetricsHolder = scanMetricsHolder;
     this.scanIterator = UNINITIALIZED_SCANNER;
-    this.restoreDir = new 
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY));
+    if (PhoenixConfigurationUtil.isMRSnapshotManagedExternally(configuration)) 
{
+      this.restoreDir = new 
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY));
+    } else {
+      this.restoreDir = new 
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY),

Review comment:
       @sakshamgangwar why do we do the UUID in both the PhoenixMapReduceUtil 
and TableSnapshotResultIterator? Shouldn't it be one or the other? (If so, 
putting here in TableSnapshotResultIterator seems clearer to me.) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to