[ 
https://issues.apache.org/jira/browse/KYLIN-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368010#comment-17368010
 ] 

Zhong Yanghong commented on KYLIN-4165:
---------------------------------------

Why we need a distributed lock for two stages, which may introduce other issues?

How about fixing it just in *SaveDictStep*?

> RT OLAP building job on "Save Cube Dictionaries" step concurrency error
> -----------------------------------------------------------------------
>
>                 Key: KYLIN-4165
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4165
>             Project: Kylin
>          Issue Type: Bug
>          Components: Real-time Streaming
>    Affects Versions: v3.0.0-alpha
>            Reporter: wangxiaojing
>            Priority: Major
>             Fix For: v3.0.0
>
>
> There is a dictionary version conflict in "Save Cube Dictionaries" step when 
> build the realtime fsegment from remote persisted to reday,Which is very 
> serious,it will lead to unsuccessful updating of dictionaries by multiple 
> jobs concurrently.This may occurs when a cube has many concurrent building 
> jobs one the same step ——”Save Cube Dictionaries“ . 
> Perhaps a globally distributed lock is needed to avoid one cube concurrency 
> running of this step .
> Save Cube Dictionaries log messages:
> {code:java}
> // code placeholder
> org.apache.kylin.common.persistence.WriteConflictException: Overwriting 
> conflict 
> /dict/DEFAULT.TASK_SNAPSHOT/GROUPVALUE/5387e747-9649-0b17-5a72-ee17f5baea0a.dict,
>  expect old TS 1568012509090, but it is 1568012509245    at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
>     at 
> org.apache.kylin.common.persistence.ResourceStore$7.call(ResourceStore.java:465)
>     at 
> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>     at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:462)
>     at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:457)
>     at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:452)
>     at 
> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
>     at 
> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
>     at 
> org.apache.kylin.engine.mr.streaming.SaveDictStep.doWork(SaveDictStep.java:122)
>     at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>     at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>     at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>     at 
> org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to