Saksham Gangwar created PHOENIX-6334:
----------------------------------------
Summary: All map tasks should operate on the same restored snapshot
Key: PHOENIX-6334
URL: https://issues.apache.org/jira/browse/PHOENIX-6334
Project: Phoenix
Issue Type: Bug
Components: core
Affects Versions: 4.14.3, 5.0.0
Reporter: Saksham Gangwar
Fix For: 5.1.0, 4.16.0, 4.x
Recently we switched an MR application from scanning live tables to scanning
snapshots (PHOENIX-3744). We ran into a severe performance issue, which turned
out to a correctness issue due to over-lapping scan splits generation. After
some debugging we figured that it has been fixed via PHOENIX-4997.
We also *need not restore the snapshot per map task*. The purpose of this Jira
is to correct that behavior. Currently, we restore the snapshot once per map
task into a temp directory. For large tables on big clusters, this creates a
storm of NN RPCs. We can do this once per job and let all the map tasks operate
on the same restored snapshot. HBase already did this via HBASE-18806, we can
do something similar.
All other performance suggestions here:
https://issues.apache.org/jira/browse/PHOENIX-6081
--
This message was sent by Atlassian Jira
(v8.3.4#803005)