[ 
https://issues.apache.org/jira/browse/PHOENIX-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinmay Kulkarni updated PHOENIX-4984:
--------------------------------------
    Description: 
When trying to get the list of jobs to submit, we get the already scheduled 
jobs list from the Yarn Resource Manager and exclude those jobs inside 
{{PhoenixMRJobSubmitter#getJobsToSubmit}}, however a naming format difference 
prevents correctly removing already running/submitted jobs.

In {{IndexTool.java}}, we use the following convention for naming the M/R job:
 INDEX_JOB_NAME_TEMPLATE = "PHOENIX_<schema name>*.*<data table 
name>_INDX_<index name>";

However, I see the following log lines for candidate jobs:

_Candidate Indexes to be built as seen from SYSTEM.CATALOG - PHOENIX_<data 
table name>_INDX_<index name> ... _

And the following for already submitted jobs as got from Yarn:

_Already Submitted/Running MR index build jobs - [PHOENIX_<schema name>.<data 
table name>_INDX_<index name>]_

Due to this naming conflict (no '.'), even though an index build M/R job is 
running for a given index, this is not detected correctly and another one can 
be started for the same index. This can lead to unnecessary load on the region 
servers hosting regions for the index

  was:
When trying to get the list of jobs to submit, we get the already scheduled 
jobs list from the Yarn Resource Manager and exclude those jobs inside 
{{PhoenixMRJobSubmitter#getJobsToSubmit}}, however a naming format difference 
prevents correctly removing already running/submitted jobs. 

In {{IndexTool.java}}, we use the following convention for naming the M/R job:
INDEX_JOB_NAME_TEMPLATE = "PHOENIX_<schema name>*.*<data table 
name>_INDX_<index name>";

However, I see the following log lines for candidate jobs:

{code:java}
Candidate Indexes to be built as seen from SYSTEM.CATALOG - 
{PHOENIX_<data table name>_INDX_<index name> ... }
{code:java}

{code:java}
Already Submitted/Running MR index build jobs - [PHOENIX_<schema name>.<data 
table name>_INDX_<index name>]
{code}

Due to this naming conflict, even though an index build M/R job is running for 
a given index, this is not detected correctly and another one can be started 
for the same index. This can lead to unnecessary load on the region servers 
hosting regions for the table


> PhoenixMRJobSubmitter does not get the correct list of jobs to submit for 
> IndexTool jobs
> ----------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4984
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4984
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Chinmay Kulkarni
>            Assignee: Geoffrey Jacoby
>            Priority: Major
>
> When trying to get the list of jobs to submit, we get the already scheduled 
> jobs list from the Yarn Resource Manager and exclude those jobs inside 
> {{PhoenixMRJobSubmitter#getJobsToSubmit}}, however a naming format difference 
> prevents correctly removing already running/submitted jobs.
> In {{IndexTool.java}}, we use the following convention for naming the M/R job:
>  INDEX_JOB_NAME_TEMPLATE = "PHOENIX_<schema name>*.*<data table 
> name>_INDX_<index name>";
> However, I see the following log lines for candidate jobs:
> _Candidate Indexes to be built as seen from SYSTEM.CATALOG - PHOENIX_<data 
> table name>_INDX_<index name> ... _
> And the following for already submitted jobs as got from Yarn:
> _Already Submitted/Running MR index build jobs - [PHOENIX_<schema name>.<data 
> table name>_INDX_<index name>]_
> Due to this naming conflict (no '.'), even though an index build M/R job is 
> running for a given index, this is not detected correctly and another one can 
> be started for the same index. This can lead to unnecessary load on the 
> region servers hosting regions for the index



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to