[jira] [Commented] (PHOENIX-6273) Add support to handle MR Snapshot restore externally

ASF GitHub Bot (Jira) Thu, 21 Jan 2021 11:06:06 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269557#comment-17269557
 ]


ASF GitHub Bot commented on PHOENIX-6273:
-----------------------------------------

sakshamgangwar commented on a change in pull request #1079:
URL: https://github.com/apache/phoenix/pull/1079#discussion_r562127876



##########
File path: 
phoenix-core/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java
##########
@@ -78,8 +80,7 @@ public TableSnapshotResultIterator(Configuration 
configuration, Scan scan, ScanM
     this.scan = scan;
     this.scanMetricsHolder = scanMetricsHolder;
     this.scanIterator = UNINITIALIZED_SCANNER;
-    this.restoreDir = new 
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY),
-        UUID.randomUUID().toString());
+    this.restoreDir = new 
Path(configuration.get(PhoenixConfigurationUtil.RESTORE_DIR_KEY));

Review comment:
       @shahrs87 I don't think so. I believe the original flow was faulty in 
itself and for every scan, we were creating a subdirectory, not instead of 
that, we have the same subdirectory for restore and every scan restore happens 
there and clean up also happens per scan. So there should not be any issue. We 
want to get rid of this structure because when we do an external restore or 
external cleanup, we need to provide exact restore directory for reading it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Add support to handle MR Snapshot restore externally
> ----------------------------------------------------
>
>                 Key: PHOENIX-6273
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6273
>             Project: Phoenix
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Saksham Gangwar
>            Assignee: Saksham Gangwar
>            Priority: Major
>             Fix For: 5.1.0, 4.16.0
>
>
> Recently we switched an MR application from scanning live tables to scanning 
> snapshots (PHOENIX-3744). We ran into a severe performance issue, which 
> turned out to a correctness issue due to over-lapping scan splits generation. 
> After some debugging we figured that it has been fixed via PHOENIX-4997. 
> We also *need not restore the snapshot per map task*. Currently, we restore 
> the snapshot once per map task into a temp directory. For large tables on big 
> clusters, this creates a storm of NN RPCs. We can do this once per job and 
> let all the map tasks operate on the same restored snapshot. HBase already 
> did this via HBASE-18806, we can do something similar. Jira to correct this 
> behavior: https://issues.apache.org/jira/browse/PHOENIX-6334
> *The purpose of this Jira* is to resolve this issue immediately by providing 
> the ability to the caller to decide whether or not snapshot restore needs to 
> be handled externally or internally on the Phoenix side (the buggy approach). 
> All other performance suggestions here: 
> https://issues.apache.org/jira/browse/PHOENIX-6081



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6273) Add support to handle MR Snapshot restore externally

Reply via email to