[
https://issues.apache.org/jira/browse/KYLIN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967441#comment-15967441
]
kangkaisen edited comment on KYLIN-2506 at 4/14/17 6:16 AM:
------------------------------------------------------------
KYLIN-2506 Refactor Global Dictionary:
This commit has run a long time stably in our prod env. This commit contains
the first nine points in the description.
KYLIN-2506 Refactor ZookeeperDistributedJobLock :
Refactor ZookeeperDistributedJobLock to make it more general,The main points of
this refactor:
1 move the JobLock interface to core-common module.
2 Add watch interface. so that when zookeeper node change, the client could
receive the notification.
3 Don't maintain lock patch itself.
4 Make the zkClient to be singleton.
5 Update the function signature and comment.
The concern of this commit is I introduce the curator-recipes dependency to
core-common module. The reason is the DistributedJobLock.watch need to return
PathChildrenCache so that the client could close the PathChildrenCache in time.
Any advices about this commit are very much appreciated.
KYLIN-2506 Add distributed lock for GlobalDictionaryBuilder
The key point of distributed lock for GlobalDictionaryBuilder:
1 Use zookeeper to implement the distributed lock. the lock path is the
TableName+ColumnName,
the lock will add in GlobalDictionaryBuilder.init and release in
GlobalDictionaryBuilder.build or throw
exception in GlobalDictionaryBuilder.addValue.
2 when the Kylin thread creating the dict get the lock failed, it will watch
the path and block current thread with BlockingQueue and when receive the watch
event and get the lock successfully, the current thread will be awaked.
3 refer to
https://cwiki.apache.org/confluence/display/CURATOR/TN10
https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
http://antirez.com/news/101
I use 4 ways to ensure the zookeeper lock is reliable as possible as I can:
a Enlarge the session timeout to 120s
-b Add the listener for ConnectionState.SUSPENDED and
ConnectionState.LOST-(it's unnecessary)
c Check the client whether keep the lock when commit and process every
1_000_000 values
d Add sanityCheck when commit the dict in HDFS
was (Author: kangkaisen):
KYLIN-2506 Refactor Global Dictionary:
This commit has run a long time stably in our prod env. This commit contains
the first nine points in the description.
KYLIN-2506 Refactor ZookeeperDistributedJobLock :
Refactor ZookeeperDistributedJobLock to make it more general,The main points of
this refactor:
1 move the JobLock interface to core-common module.
2 Add watch interface. so that when zookeeper node change, the client could
receive the notification.
3 Don't maintain lock patch itself.
4 Make the zkClient to be singleton.
5 Update the function signature and comment.
The concern of this commit is I introduce the curator-recipes dependency to
core-common module. The reason is the DistributedJobLock.watch need to return
PathChildrenCache so that the client could close the PathChildrenCache in time.
Any advices about this commit are very much appreciated.
KYLIN-2506 Add distributed lock for GlobalDictionaryBuilder
The key point of distributed lock for GlobalDictionaryBuilder:
1 Use zookeeper to implement the distributed lock. the lock path is the
TableName+ColumnName,
the lock will add in GlobalDictionaryBuilder.init and release in
GlobalDictionaryBuilder.build or throw
exception in GlobalDictionaryBuilder.addValue.
2 when the Kylin thread creating the dict get the lock failed, it will watch
the path and block current thread with BlockingQueue and when receive the watch
event and get the lock successfully, the current thread will be awaked.
3 refer to
https://cwiki.apache.org/confluence/display/CURATOR/TN10
https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
http://antirez.com/news/101
I use 4 ways to ensure the zookeeper lock is reliable as possible as I can:
a Enlarge the session timeout to 120s
b Add the listener for ConnectionState.SUSPENDED and ConnectionState.LOST
c Check the client whether keep the lock when commit and process every
1_000_000 values
d Add sanityCheck when commit the dict in HDFS
> Refactor Global Dictionary
> --------------------------
>
> Key: KYLIN-2506
> URL: https://issues.apache.org/jira/browse/KYLIN-2506
> Project: Kylin
> Issue Type: Improvement
> Components: General
> Affects Versions: v2.0.0
> Reporter: kangkaisen
> Assignee: kangkaisen
> Fix For: v2.0.0
>
>
> The main points of this refactor:
> 1 Fix the bug that the RemoveListener of LoadingCache swallowed any
> exceptions when building the GlobalDict.
> 2 Fix the bug that the HDFS filename of DictSliceKey had Illegal characters.
> 3 Fix the bug that the HDFS filename of DictSliceKey maybe longer than 255.
> 4 Fix the bug that DictNode split failed if value length greater than 255
> bytes.
> 5 Decouple the build and query of GlobalDict:
> Abstract the builder of AppendTrieDictionary to AppendTrieDictionaryBuilder;
> Add LoadingCache to AppendTrieDictionary and make AppendTrieDictionary is
> only readable.
> 6 Remove dependence of LoadingCache when building the GlobalDict.
> 7 Abstract the HDFS operations to GlobalDictStore.
> 8 Abstract the metadata of GlobalDict to GlobalDictMetadata.
> 9 Delete CachedTreeMap.
> 10 Add distributed lock for GlobalDict.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)