gjacoby126 commented on a change in pull request #1079:
URL: https://github.com/apache/phoenix/pull/1079#discussion_r561272750
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java
##########
@@ -866,4 +872,18 @@ public static void setTenantId(Configuration
configuration, String tenantId){
configuration.set(MAPREDUCE_TENANT_ID, tenantId);
}
+ public static void setMRSnapshotManagedInternally(Configuration
configuration, Boolean isSnapshotRestoreManagedInternally) {
+ Preconditions.checkNotNull(configuration);
+ Preconditions.checkNotNull(isSnapshotRestoreManagedInternally);
+ configuration.set(MANAGE_MR_SNAPSHOT_RESTORE_INTERNALLY,
+ String.valueOf(isSnapshotRestoreManagedInternally));
+ }
+
+ public static boolean getMRSnapshotManagedInternally(final Configuration
configuration) {
Review comment:
In addition to switching the setting to Externally, this should probably
have an "is" prefix, not a get, because it's a boolean. (As you do for the
local variable)
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java
##########
@@ -866,4 +872,18 @@ public static void setTenantId(Configuration
configuration, String tenantId){
configuration.set(MAPREDUCE_TENANT_ID, tenantId);
}
+ public static void setMRSnapshotManagedInternally(Configuration
configuration, Boolean isSnapshotRestoreManagedInternally) {
Review comment:
I think should be setMRSnapshotManagedExternally. That keeps it closer
to the HBase solution this is modeled after (HBASE-18806) where true means that
the external job has already restored the snapshot and will clean it up.
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
##########
@@ -88,19 +90,34 @@ public TableSnapshotResultIterator(Configuration
configuration, Scan scan, ScanM
}
private void init() throws IOException {
- RestoreSnapshotHelper.RestoreMetaChanges meta =
- RestoreSnapshotHelper.copySnapshotForScanner(this.configuration,
this.fs,
- this.rootDir, this.restoreDir, this.snapshotName);
- List<RegionInfo> restoredRegions = meta.getRegionsToAdd();
- this.htd = meta.getTableDescriptor();
- this.regions = new ArrayList<RegionInfo>(restoredRegions.size());
-
- for (RegionInfo restoredRegion : restoredRegions) {
- if (isValidRegion(restoredRegion)) {
- this.regions.add(restoredRegion);
+ if
(PhoenixConfigurationUtil.getMRSnapshotManagedInternally(configuration)) {
+ RestoreSnapshotHelper.RestoreMetaChanges meta =
+ RestoreSnapshotHelper.copySnapshotForScanner(this.configuration,
this.fs, this.rootDir,
+ this.restoreDir, this.snapshotName);
+ List<RegionInfo> restoredRegions = meta.getRegionsToAdd();
+ this.htd = meta.getTableDescriptor();
+ this.regions = new ArrayList<>(restoredRegions.size());
+ for (RegionInfo restoredRegion : restoredRegions) {
+ if (isValidRegion(restoredRegion)) {
+ this.regions.add(restoredRegion);
+ }
+ }
+ } else {
+ Path snapshotDir =
SnapshotDescriptionUtils.getCompletedSnapshotDir(snapshotName, rootDir);
Review comment:
@bharathv - I agree with your approach as the correct way forward, but
since we're very close to a release think that work can be deferred to a future
JIRA. The ability to turn off any automated snapshot handling is an easy fix to
the immediate problem, and still useful even after we fix the automation
problems later (for example if two jobs are part of a larger process and want
to share a restored snapshot.) What do you think?
@sakshamgangwar - if we do go with the current approach, we should probably
change the name of the JIRA and commit to something more descriptive of what it
actually does.
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java
##########
@@ -866,4 +872,18 @@ public static void setTenantId(Configuration
configuration, String tenantId){
configuration.set(MAPREDUCE_TENANT_ID, tenantId);
}
+ public static void setMRSnapshotManagedInternally(Configuration
configuration, Boolean isSnapshotRestoreManagedInternally) {
Review comment:
I think should be setMRSnapshotManagedExternally. That keeps it closer
to the HBase solution this is modeled after (HBASE-18806) where true means that
the external process requesting the MR job has already restored the snapshot
and will clean it up.
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
##########
@@ -78,8 +80,7 @@ public TableSnapshotResultIterator(Configuration
configuration, Scan scan, ScanM
this.scan = scan;
this.scanMetricsHolder = scanMetricsHolder;
this.scanIterator = UNINITIALIZED_SCANNER;
- this.restoreDir = new
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY),
- UUID.randomUUID().toString());
+ this.restoreDir = new
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY));
Review comment:
@sakshamgangwar - Remember that scans are taking place in parallel as
multiple mappers may be running at the same time. Until we fix the automation
to only create once in a future JIRA, don't we need to keep the restored files
in separate directories so tasks won't step on each other?
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
##########
@@ -80,7 +81,12 @@ public TableSnapshotResultIterator(Configuration
configuration, Scan scan, ScanM
this.scan = scan;
this.scanMetricsHolder = scanMetricsHolder;
this.scanIterator = UNINITIALIZED_SCANNER;
- this.restoreDir = new
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY));
+ if (PhoenixConfigurationUtil.isMRSnapshotManagedExternally(configuration))
{
+ this.restoreDir = new
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY));
+ } else {
+ this.restoreDir = new
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY),
Review comment:
@sakshamgangwar why do we do the UUID in both the PhoenixMapReduceUtil
and TableSnapshotResultIterator? Shouldn't it be one or the other? (If so,
putting here in TableSnapshotResultIterator seems clearer to me.)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]