[ 
https://issues.apache.org/jira/browse/KYLIN-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Geng updated KYLIN-4291:
------------------------------
    External issue URL: 
https://lists.apache.org/thread.html/c75e96857da1eecb74b8e9ef18e6ed242b2933cf87e43c023d8b54da%40%3Cuser.kylin.apache.org%3E
           Description: 
We noticed a segment refresh job at ERROR status at "*2019-12-10 02:27:15*". 
After a deep dive into kylin.log, we found that the WriteConflictException was 
raised when trying to update the last modified time for the dict.

 

(Sensitive content are consered)
{code:java}
2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job 
f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 : 
error execute HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03, 
name=Build Dimension Dictionary, state=RUNNING}
org.apache.kylin.common.persistence.WriteConflictException: Overwriting 
conflict 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict,
 expect old TS 1575916011421, but it is 1575916035573
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
at 
org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
at 
org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
at 
org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
at 
org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
at 
org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
at 
org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
at 
org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
at 
org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
at 
org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
at org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
at 
org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
at 
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
We have a glance at the codebase (Kylin-2.6.4), we found the last modified time 
of the dict (which has been built) will be updated at the step "Build Dimension 
Dictionary". The log blow shows the same dict is processed many times.
{code:java}
Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep "has already been built, 
save it"|grep "THIS_IS_COLUMN_NAME"
2019-12-10 02:07:10,103 DEBUG [Scheduler 1186524190 Job 
5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:11,036 DEBUG [Scheduler 1186524190 Job 
593979b8-e80b-590f-335b-526c2fc49080-295] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:20,455 DEBUG [Scheduler 1186524190 Job 
160952f3-fef2-4fdf-c08a-77d49f830805-246] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:20,888 DEBUG [Scheduler 1186524190 Job 
69304c1c-cc02-3bbe-67d4-efdd3858e88a-234] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:21,634 DEBUG [Scheduler 1186524190 Job 
345b688c-7992-f2cf-ae06-835675f545ea-117] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:23,040 DEBUG [Scheduler 1186524190 Job 
0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:36,358 DEBUG [Scheduler 1186524190 Job 
6daf351f-5073-128f-8754-f499e345db54-308] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:26:43,716 DEBUG [Scheduler 1186524190 Job 
c90a765c-976f-49c2-38d8-e6812d091cfb-295] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:26:44,944 DEBUG [Scheduler 1186524190 Job 
cd108004-12c6-078f-8646-f29841de9e40-205] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:26:51,306 DEBUG [Scheduler 1186524190 Job 
d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:27:15,523 DEBUG [Scheduler 1186524190 Job 
ea07ef8f-b2b7-b3f8-c481-df615025217d-117] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:27:15,535 DEBUG [Scheduler 1186524190 Job 
f76fb12e-477f-b404-e021-adaf520d04bb-321] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:29:42,504 DEBUG [Scheduler 1186524190 Job 
f894f427-b170-19b6-76fb-e1b671659771-321] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it

{code}
 

Since we issue the refresh segments request concurrently (with REST APIs), the 
dict has a chance to be modified concurrently. Kylin uses checkAndPut to update 
HBase to prevent race-condition, so the WriteConflictException may be raised 
when updating the same resource. 
  
{code:java}
Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict|grep
 "Update dictionary "
2019-12-10 02:07:10,223 INFO  [Scheduler 1186524190 Job 
5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575914830203
2019-12-10 02:07:11,109 INFO  [Scheduler 1186524190 Job 
593979b8-e80b-590f-335b-526c2fc49080-295] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575914831090
2019-12-10 02:07:20,554 INFO  [Scheduler 1186524190 Job 
160952f3-fef2-4fdf-c08a-77d49f830805-246] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575914840528
2019-12-10 02:07:21,762 INFO  [Scheduler 1186524190 Job 
345b688c-7992-f2cf-ae06-835675f545ea-117] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575914841727
2019-12-10 02:07:23,253 INFO  [Scheduler 1186524190 Job 
0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575914843095
2019-12-10 02:26:43,843 INFO  [Scheduler 1186524190 Job 
c90a765c-976f-49c2-38d8-e6812d091cfb-295] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575916003771
2019-12-10 02:26:51,436 INFO  [Scheduler 1186524190 Job 
d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575916011421
2019-12-10 02:27:15,593 INFO  [Scheduler 1186524190 Job 
ea07ef8f-b2b7-b3f8-c481-df615025217d-117] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575916035573
2019-12-10 02:29:42,564 INFO  [Scheduler 1186524190 Job 
f894f427-b170-19b6-76fb-e1b671659771-321] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575916182549
{code}
{color:#000000}The log above shown that the job 
"f76fb12e-477f-b404-e021-adaf520d04bb" has not been print out the log - "Update 
dictionary blablabla...." because of the WriteConflictException.{color}
  
After contacted with community, [~shaofengshi] has been confirmed that seems to 
be a bug introduced in KYLIN-3977 in Kylin 2.6.3.
  

 

 

  was:
We noticed a segment refresh job at ERROR status at "*2019-12-10 02:27:15*". 
After a deep dive into kylin.log, we found that the WriteConflictException was 
raised when trying to update the last modified time for the dict.

 

(Sensitive content are consered)
{code:java}
2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job 
f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 : 
error execute HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03, 
name=Build Dimension Dictionary, state=RUNNING}
org.apache.kylin.common.persistence.WriteConflictException: Overwriting 
conflict 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict,
 expect old TS 1575916011421, but it is 1575916035573
at 
org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
at 
org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
at 
org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
at 
org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
at 
org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
at 
org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
at 
org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
at 
org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
at 
org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
at 
org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
at org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
at 
org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
at 
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
We have a glance at the codebase (Kylin-2.6.4), we found the last modified time 
of the dict (which has been built) will be updated at the step "Build Dimension 
Dictionary". The log blow shows the same dict is processed many times.
{code:java}
Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep "has already been built, 
save it"|grep "THIS_IS_COLUMN_NAME"
2019-12-10 02:07:10,103 DEBUG [Scheduler 1186524190 Job 
5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:11,036 DEBUG [Scheduler 1186524190 Job 
593979b8-e80b-590f-335b-526c2fc49080-295] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:20,455 DEBUG [Scheduler 1186524190 Job 
160952f3-fef2-4fdf-c08a-77d49f830805-246] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:20,888 DEBUG [Scheduler 1186524190 Job 
69304c1c-cc02-3bbe-67d4-efdd3858e88a-234] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:21,634 DEBUG [Scheduler 1186524190 Job 
345b688c-7992-f2cf-ae06-835675f545ea-117] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:23,040 DEBUG [Scheduler 1186524190 Job 
0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:07:36,358 DEBUG [Scheduler 1186524190 Job 
6daf351f-5073-128f-8754-f499e345db54-308] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:26:43,716 DEBUG [Scheduler 1186524190 Job 
c90a765c-976f-49c2-38d8-e6812d091cfb-295] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:26:44,944 DEBUG [Scheduler 1186524190 Job 
cd108004-12c6-078f-8646-f29841de9e40-205] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:26:51,306 DEBUG [Scheduler 1186524190 Job 
d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:27:15,523 DEBUG [Scheduler 1186524190 Job 
ea07ef8f-b2b7-b3f8-c481-df615025217d-117] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:27:15,535 DEBUG [Scheduler 1186524190 Job 
f76fb12e-477f-b404-e021-adaf520d04bb-321] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it
2019-12-10 02:29:42,504 DEBUG [Scheduler 1186524190 Job 
f894f427-b170-19b6-76fb-e1b671659771-321] cli.DictionaryGeneratorCLI:99 : Dict 
for 'THIS_IS_COLUMN_NAME' has already been built, save it

{code}
 

Since we issue the refresh segments request concurrently (with REST APIs), the 
dict has a chance to be modified concurrently. Kylin uses checkAndPut to update 
HBase to prevent race-condition, so the WriteConflictException may be raised 
when updating the same resource. 
 
{code:java}
Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict|grep
 "Update dictionary "
2019-12-10 02:07:10,223 INFO  [Scheduler 1186524190 Job 
5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575914830203
2019-12-10 02:07:11,109 INFO  [Scheduler 1186524190 Job 
593979b8-e80b-590f-335b-526c2fc49080-295] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575914831090
2019-12-10 02:07:20,554 INFO  [Scheduler 1186524190 Job 
160952f3-fef2-4fdf-c08a-77d49f830805-246] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575914840528
2019-12-10 02:07:21,762 INFO  [Scheduler 1186524190 Job 
345b688c-7992-f2cf-ae06-835675f545ea-117] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575914841727
2019-12-10 02:07:23,253 INFO  [Scheduler 1186524190 Job 
0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575914843095
2019-12-10 02:26:43,843 INFO  [Scheduler 1186524190 Job 
c90a765c-976f-49c2-38d8-e6812d091cfb-295] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575916003771
2019-12-10 02:26:51,436 INFO  [Scheduler 1186524190 Job 
d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575916011421
2019-12-10 02:27:15,593 INFO  [Scheduler 1186524190 Job 
ea07ef8f-b2b7-b3f8-c481-df615025217d-117] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575916035573
2019-12-10 02:29:42,564 INFO  [Scheduler 1186524190 Job 
f894f427-b170-19b6-76fb-e1b671659771-321] dict.DictionaryManager:198 : Update 
dictionary 
/dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
 lastModifiedTime to 1575916182549
{code}
{color:#000000}The log above shown that the job 
"f76fb12e-477f-b404-e021-adaf520d04bb" has not been print out the log - "Update 
dictionary blablabla...." because of the WriteConflictException.{color}
 
[After contacted with 
community|[https://lists.apache.org/thread.html/c75e96857da1eecb74b8e9ef18e6ed242b2933cf87e43c023d8b54da%40%3Cuser.kylin.apache.org%3E]],
 [~shaofengshi] has been confirmed that seems to be a bug introduced in 
[KYLIN-3977|https://issues.apache.org/jira/browse/KYLIN-3977] in Kylin 2.6.3.
 

 

 


> Parallel segment building may causes WriteConflictException
> -----------------------------------------------------------
>
>                 Key: KYLIN-4291
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4291
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v2.6.4
>            Reporter: Liang Geng
>            Priority: Major
>
> We noticed a segment refresh job at ERROR status at "*2019-12-10 02:27:15*". 
> After a deep dive into kylin.log, we found that the WriteConflictException 
> was raised when trying to update the last modified time for the dict.
>  
> (Sensitive content are consered)
> {code:java}
> 2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job 
> f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 : 
> error execute 
> HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03, name=Build 
> Dimension Dictionary, state=RUNNING}
> org.apache.kylin.common.persistence.WriteConflictException: Overwriting 
> conflict 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict,
>  expect old TS 1575916011421, but it is 1575916035573
> at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
> at 
> org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
> at 
> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
> at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
> at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
> at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
> at 
> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
> at 
> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
> at 
> org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
> at 
> org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
> at org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
> at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
> at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
> at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
> at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> We have a glance at the codebase (Kylin-2.6.4), we found the last modified 
> time of the dict (which has been built) will be updated at the step "Build 
> Dimension Dictionary". The log blow shows the same dict is processed many 
> times.
> {code:java}
> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep "has already been built, 
> save it"|grep "THIS_IS_COLUMN_NAME"
> 2019-12-10 02:07:10,103 DEBUG [Scheduler 1186524190 Job 
> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:11,036 DEBUG [Scheduler 1186524190 Job 
> 593979b8-e80b-590f-335b-526c2fc49080-295] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:20,455 DEBUG [Scheduler 1186524190 Job 
> 160952f3-fef2-4fdf-c08a-77d49f830805-246] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:20,888 DEBUG [Scheduler 1186524190 Job 
> 69304c1c-cc02-3bbe-67d4-efdd3858e88a-234] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:21,634 DEBUG [Scheduler 1186524190 Job 
> 345b688c-7992-f2cf-ae06-835675f545ea-117] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:23,040 DEBUG [Scheduler 1186524190 Job 
> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:36,358 DEBUG [Scheduler 1186524190 Job 
> 6daf351f-5073-128f-8754-f499e345db54-308] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:26:43,716 DEBUG [Scheduler 1186524190 Job 
> c90a765c-976f-49c2-38d8-e6812d091cfb-295] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:26:44,944 DEBUG [Scheduler 1186524190 Job 
> cd108004-12c6-078f-8646-f29841de9e40-205] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:26:51,306 DEBUG [Scheduler 1186524190 Job 
> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:27:15,523 DEBUG [Scheduler 1186524190 Job 
> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:27:15,535 DEBUG [Scheduler 1186524190 Job 
> f76fb12e-477f-b404-e021-adaf520d04bb-321] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:29:42,504 DEBUG [Scheduler 1186524190 Job 
> f894f427-b170-19b6-76fb-e1b671659771-321] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> {code}
>  
> Since we issue the refresh segments request concurrently (with REST APIs), 
> the dict has a chance to be modified concurrently. Kylin uses checkAndPut to 
> update HBase to prevent race-condition, so the WriteConflictException may be 
> raised when updating the same resource. 
>   
> {code:java}
> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict|grep
>  "Update dictionary "
> 2019-12-10 02:07:10,223 INFO  [Scheduler 1186524190 Job 
> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575914830203
> 2019-12-10 02:07:11,109 INFO  [Scheduler 1186524190 Job 
> 593979b8-e80b-590f-335b-526c2fc49080-295] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575914831090
> 2019-12-10 02:07:20,554 INFO  [Scheduler 1186524190 Job 
> 160952f3-fef2-4fdf-c08a-77d49f830805-246] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575914840528
> 2019-12-10 02:07:21,762 INFO  [Scheduler 1186524190 Job 
> 345b688c-7992-f2cf-ae06-835675f545ea-117] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575914841727
> 2019-12-10 02:07:23,253 INFO  [Scheduler 1186524190 Job 
> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575914843095
> 2019-12-10 02:26:43,843 INFO  [Scheduler 1186524190 Job 
> c90a765c-976f-49c2-38d8-e6812d091cfb-295] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575916003771
> 2019-12-10 02:26:51,436 INFO  [Scheduler 1186524190 Job 
> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575916011421
> 2019-12-10 02:27:15,593 INFO  [Scheduler 1186524190 Job 
> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575916035573
> 2019-12-10 02:29:42,564 INFO  [Scheduler 1186524190 Job 
> f894f427-b170-19b6-76fb-e1b671659771-321] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575916182549
> {code}
> {color:#000000}The log above shown that the job 
> "f76fb12e-477f-b404-e021-adaf520d04bb" has not been print out the log - 
> "Update dictionary blablabla...." because of the 
> WriteConflictException.{color}
>   
> After contacted with community, [~shaofengshi] has been confirmed that seems 
> to be a bug introduced in KYLIN-3977 in Kylin 2.6.3.
>   
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to