[jira] [Updated] (PHOENIX-6273) All the map tasks should operate on the same restored snapshot

Saksham Gangwar (Jira) Fri, 18 Dec 2020 11:31:35 -0800


     [ 
https://issues.apache.org/jira/browse/PHOENIX-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Saksham Gangwar updated PHOENIX-6273:
-------------------------------------
    Description: 
Recently we switched an MR application from scanning live tables to scanning 
snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned 
out to a correctness issue due to over-lapping scan splits generation. After 
some debugging we figured that it has been fixed via PHOENIX-4997. 

We also *need not restore the snapshot per map task*. Currently, we restore the 
snapshot once per map task into a temp directory. For large tables on big 
clusters, this creates a storm of NN RPCs. We can do this once per job and let 
all the map tasks operate on the same restored snapshot. HBase already did this 
via HBASE-18806, we can do something similar.

 

All other performance suggestions here: 
https://issues.apache.org/jira/browse/PHOENIX-6081

  was:
Recently we switched an MR application from scanning live tables to scanning 
snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned 
out to a correctness issue due to over-lapping scan splits generation. After 
some debugging we figured that it has been fixed via PHOENIX-4997. 

We also *need not restore the snapshot per map task*. Currently, we restore the 
snapshot once per map task into a temp directory. For large tables on big 
clusters, this creates a storm of NN RPCs. We can do this once per job and let 
all the map tasks operate on the same restored snapshot. HBase already did this 
via HBASE-18806, we can do something similar.


> All the map tasks should operate on the same restored snapshot
> --------------------------------------------------------------
>
>                 Key: PHOENIX-6273
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6273
>             Project: Phoenix
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Saksham Gangwar
>            Assignee: Saksham Gangwar
>            Priority: Major
>             Fix For: 4.x, 4.16.1
>
>
> Recently we switched an MR application from scanning live tables to scanning 
> snapshots (PHOENIX-3744). We ran into a severe performance issue, which 
> turned out to a correctness issue due to over-lapping scan splits generation. 
> After some debugging we figured that it has been fixed via PHOENIX-4997. 
> We also *need not restore the snapshot per map task*. Currently, we restore 
> the snapshot once per map task into a temp directory. For large tables on big 
> clusters, this creates a storm of NN RPCs. We can do this once per job and 
> let all the map tasks operate on the same restored snapshot. HBase already 
> did this via HBASE-18806, we can do something similar.
>  
> All other performance suggestions here: 
> https://issues.apache.org/jira/browse/PHOENIX-6081



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (PHOENIX-6273) All the map tasks should operate on the same restored snapshot

Reply via email to