[ 
https://issues.apache.org/jira/browse/KYLIN-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4291.
-------------------------------

Resolved in release 3.1.0 (2020-07-03)

> Parallel segment building may causes WriteConflictException
> -----------------------------------------------------------
>
>                 Key: KYLIN-4291
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4291
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v2.6.4
>            Reporter: Liang Geng
>            Assignee: Yaqian Zhang
>            Priority: Major
>             Fix For: v2.6.5, v3.1.0
>
>
> We noticed a segment refresh job at ERROR status at "*2019-12-10 02:27:15*". 
> After a deep dive into kylin.log, we found that the WriteConflictException 
> was raised when trying to update the last modified time for the dict.
>  
> (Sensitive content are consered)
> {code:java}
> 2019-12-10 02:27:15,576 ERROR [Scheduler 1186524190 Job 
> f76fb12e-477f-b404-e021-adaf520d04bb-321] common.HadoopShellExecutable:65 : 
> error execute 
> HadoopShellExecutable{id=f76fb12e-477f-b404-e021-adaf520d04bb-03, name=Build 
> Dimension Dictionary, state=RUNNING}
> org.apache.kylin.common.persistence.WriteConflictException: Overwriting 
> conflict 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict,
>  expect old TS 1575916011421, but it is 1575916035573
> at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
> at 
> org.apache.kylin.common.persistence.ResourceStore.lambda$updateTimestampWithRetry$4(ResourceStore.java:443)
> at 
> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
> at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:442)
> at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:437)
> at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:432)
> at 
> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
> at 
> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
> at 
> org.apache.kylin.dict.DictionaryManager.saveDictionary(DictionaryManager.java:339)
> at 
> org.apache.kylin.cube.CubeManager$DictionaryAssist.saveDictionary(CubeManager.java:1145)
> at org.apache.kylin.cube.CubeManager.saveDictionary(CubeManager.java:1107)
> at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:100)
> at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:69)
> at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
> at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:93)
> at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> We have a glance at the codebase (Kylin-2.6.4), we found the last modified 
> time of the dict (which has been built) will be updated at the step "Build 
> Dimension Dictionary". The log blow shows the same dict is processed many 
> times.
> {code:java}
> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep "has already been built, 
> save it"|grep "THIS_IS_COLUMN_NAME"
> 2019-12-10 02:07:10,103 DEBUG [Scheduler 1186524190 Job 
> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:11,036 DEBUG [Scheduler 1186524190 Job 
> 593979b8-e80b-590f-335b-526c2fc49080-295] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:20,455 DEBUG [Scheduler 1186524190 Job 
> 160952f3-fef2-4fdf-c08a-77d49f830805-246] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:20,888 DEBUG [Scheduler 1186524190 Job 
> 69304c1c-cc02-3bbe-67d4-efdd3858e88a-234] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:21,634 DEBUG [Scheduler 1186524190 Job 
> 345b688c-7992-f2cf-ae06-835675f545ea-117] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:23,040 DEBUG [Scheduler 1186524190 Job 
> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:07:36,358 DEBUG [Scheduler 1186524190 Job 
> 6daf351f-5073-128f-8754-f499e345db54-308] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:26:43,716 DEBUG [Scheduler 1186524190 Job 
> c90a765c-976f-49c2-38d8-e6812d091cfb-295] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:26:44,944 DEBUG [Scheduler 1186524190 Job 
> cd108004-12c6-078f-8646-f29841de9e40-205] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:26:51,306 DEBUG [Scheduler 1186524190 Job 
> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:27:15,523 DEBUG [Scheduler 1186524190 Job 
> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:27:15,535 DEBUG [Scheduler 1186524190 Job 
> f76fb12e-477f-b404-e021-adaf520d04bb-321] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> 2019-12-10 02:29:42,504 DEBUG [Scheduler 1186524190 Job 
> f894f427-b170-19b6-76fb-e1b671659771-321] cli.DictionaryGeneratorCLI:99 : 
> Dict for 'THIS_IS_COLUMN_NAME' has already been built, save it
> {code}
>  
> Since we issue the refresh segments request concurrently (with REST APIs), 
> the dict has a chance to be modified concurrently. Kylin uses checkAndPut to 
> update HBase to prevent race-condition, so the WriteConflictException may be 
> raised when updating the same resource. 
>   
> {code:java}
> Liangs-MacBook-Pro:~ pwrliang$ cat 12-10.log |grep 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict|grep
>  "Update dictionary "
> 2019-12-10 02:07:10,223 INFO  [Scheduler 1186524190 Job 
> 5c4b2ee9-c8aa-9a69-ffb0-3a6811ad22a9-372] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575914830203
> 2019-12-10 02:07:11,109 INFO  [Scheduler 1186524190 Job 
> 593979b8-e80b-590f-335b-526c2fc49080-295] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575914831090
> 2019-12-10 02:07:20,554 INFO  [Scheduler 1186524190 Job 
> 160952f3-fef2-4fdf-c08a-77d49f830805-246] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575914840528
> 2019-12-10 02:07:21,762 INFO  [Scheduler 1186524190 Job 
> 345b688c-7992-f2cf-ae06-835675f545ea-117] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575914841727
> 2019-12-10 02:07:23,253 INFO  [Scheduler 1186524190 Job 
> 0ea711ed-71b5-ae47-a34a-f0a27fd9d7cc-303] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575914843095
> 2019-12-10 02:26:43,843 INFO  [Scheduler 1186524190 Job 
> c90a765c-976f-49c2-38d8-e6812d091cfb-295] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575916003771
> 2019-12-10 02:26:51,436 INFO  [Scheduler 1186524190 Job 
> d808dadf-8eea-34db-1b1f-d50e923b4e3a-372] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575916011421
> 2019-12-10 02:27:15,593 INFO  [Scheduler 1186524190 Job 
> ea07ef8f-b2b7-b3f8-c481-df615025217d-117] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575916035573
> 2019-12-10 02:29:42,564 INFO  [Scheduler 1186524190 Job 
> f894f427-b170-19b6-76fb-e1b671659771-321] dict.DictionaryManager:198 : Update 
> dictionary 
> /dict/THIS_IS_CUBE_NAME/THIS_IS_COLUMN_NAME/010dbf72-52a9-e759-08a7-ed7cde0c6e0d.dict
>  lastModifiedTime to 1575916182549
> {code}
> {color:#000000}The log above shown that the job 
> "f76fb12e-477f-b404-e021-adaf520d04bb" has not been print out the log - 
> "Update dictionary blablabla...." because of the 
> WriteConflictException.{color}
>   
> After contacted with community, [~shaofengshi] has been confirmed that seems 
> to be a bug introduced in KYLIN-3977 in Kylin 2.6.3.
>   
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to