[ 
https://issues.apache.org/jira/browse/KYLIN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023085#comment-17023085
 ] 

Andras Istvan Nagy commented on KYLIN-4348:
-------------------------------------------

I was about to file a ticket for an issue but perhaps this one is about the 
same issue that we have. 

It seems that the patch for the 
https://issues.apache.org/jira/browse/KYLIN-4165 ticket introduced an issue 
with distributed locking in our environment. After some time, we get a lot of 
"STREAM CUBE" jobs stuck at 0%, not making progress, and no jobs in yarn at 
all, and then the new jobs start piling up as there are already 10 in running 
state.

At the same time, I see this in the log, which hints at an issue with the 
locking implementation:
{code:java}
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 
488ee680-2d37-9c8d-f5bd-82d07df51869-00, parent lock 
path(/cube_job_lock/cube_jw_v2) is locked by other job result is true 
,ephemeral lock path :/cube_job_ephemeral_lock/cube_jw_v2 is locked by other 
job result is true,will try after one minute
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 
00358b93-5368-f746-17d9-6a95a8144f73-00, parent lock 
path(/cube_job_lock/cube_tm_v2) is locked by other job result is true 
,ephemeral lock path :/cube_job_ephemeral_lock/cube_tm_v2 is locked by other 
job result is true,will try after one minute
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 
452d4c95-c65e-9707-dee7-94be920ba319-00, parent lock 
path(/cube_job_lock/cube_tm_v2) is locked by other job result is true 
,ephemeral lock path :/cube_job_ephemeral_lock/cube_tm_v2 is locked by other 
job result is true,will try after one minute
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 
4e73dab2-efe3-5257-5990-46295a4e564d-00, parent lock 
path(/cube_job_lock/cube_jw_v2) is locked by other job result is true 
,ephemeral lock path :/cube_job_ephemeral_lock/cube_jw_v2 is locked by other 
job result is true,will try after one minute
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 
232765e7-f171-0e6c-d722-a0d5933e7400-00, parent lock 
path(/cube_job_lock/cube_tm_v2) is locked by other job result is true 
,ephemeral lock path :/cube_job_ephemeral_lock/cube_tm_v2 is locked by other 
job result is true,will try after one minute
2020-01-20 10:45:46 INFO  MapReduceExecutable:409 - 
4c734f3d-bc40-0ce8-3a8e-943a3524a57a-00, parent lock 
path(/cube_job_lock/cube_jw_v2) is locked by other job result is true 
,ephemeral lock path :/cube_job_ephemeral_lock/cube_jw_v2 is locked by other 
job result is true,will try after one minute
2020-01-20 10:45:55 INFO  MapReduceExecutable:409 - 
90d6eda0-a55f-374a-3419-178ef328416a-00, parent lock 
path(/cube_job_lock/cube_tm_v2) is locked by other job result is true 
,ephemeral lock path :/cube_job_ephemeral_lock/cube_tm_v2 is locked by other 
job result is true,will try after one minute
2020-01-20 10:45:58 INFO  MapReduceExecutable:409 - 
582553e4-36b4-54af-23ed-15b36b4154bf-00, parent lock 
path(/cube_job_lock/cube_jw_v2) is locked by other job result is true 
,ephemeral lock path :/cube_job_ephemeral_lock/cube_jw_v2 is locked by other 
job result is true,will try after one minute
2020-01-20 10:46:02 INFO  MapReduceExecutable:409 - 
6d13c89f-f69f-4345-ab28-03afb7a7ac88-00, parent lock 
path(/cube_job_lock/cube_jw_v2) is locked by other job result is true 
,ephemeral lock path :/cube_job_ephemeral_lock/cube_jw_v2 is locked by other 
job result is true,will try after one minute
{code}
After reverting the patch for the KYLIN-4165, the issue disappeared.

cc [~wangxiaojing] - have you seen the same issue?

> Fix distributed concurrency lock bug
> ------------------------------------
>
>                 Key: KYLIN-4348
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4348
>             Project: Kylin
>          Issue Type: Sub-task
>            Reporter: wangxiaojing
>            Assignee: wangxiaojing
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to