Chinmay Kulkarni created PHOENIX-4984:
-----------------------------------------
Summary: PhoenixMRJobSubmitter does not get the correct list of
jobs to submit for IndexTool jobs
Key: PHOENIX-4984
URL: https://issues.apache.org/jira/browse/PHOENIX-4984
Project: Phoenix
Issue Type: Bug
Reporter: Chinmay Kulkarni
Assignee: Geoffrey Jacoby
When trying to get the list of jobs to submit, we get the already scheduled
jobs list from the Yarn Resource Manager and exclude those jobs inside
{{PhoenixMRJobSubmitter#getJobsToSubmit}}, however a naming format difference
prevents correctly removing already running/submitted jobs.
In {{IndexTool.java}}, we use the following convention for naming the M/R job:
INDEX_JOB_NAME_TEMPLATE = "PHOENIX_<schema name>*.*<data table
name>_INDX_<index name>";
However, I see the following log lines for candidate jobs:
{code:java}
Candidate Indexes to be built as seen from SYSTEM.CATALOG -
{PHOENIX_<data table name>_INDX_<index name> ... }
{code:java}
{code:java}
Already Submitted/Running MR index build jobs - [PHOENIX_<schema name>.<data
table name>_INDX_<index name>]
{code}
Due to this naming conflict, even though an index build M/R job is running for
a given index, this is not detected correctly and another one can be started
for the same index. This can lead to unnecessary load on the region servers
hosting regions for the table
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)