[
https://issues.apache.org/jira/browse/TAJO-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909379#comment-13909379
]
Mafish commented on TAJO-611:
-----------------------------
Hi There,
I've done some basic investigation on the discovery service. Min give a very
detailed discussion about the resource managements on the comment section of
Tajo-540. That's very useful for me. Thanks Min. Now I have some questions to
discuss.
What's the current resource management mechanism in Tajo and what are related
classes? Based on your previous discussion, it seems Tajo uses Yarn, but not at
CPU/Memory level. Do we need a resource management with more granularity? It
seem this question is more related to Tajo-540.
> (Umbrella) Service Discovery
> -----------------------------
>
> Key: TAJO-611
> URL: https://issues.apache.org/jira/browse/TAJO-611
> Project: Tajo
> Issue Type: New Feature
> Affects Versions: 1.0-incubating
> Reporter: Min Zhou
> Fix For: 1.0-incubating
>
>
> As we talked offline, high availability is one of our next goal. Service
> discovery can help us maintain health statuses for all daemons ( master and
> workers) . Meanwhile, those daemons can find each other easily. Furthermore,
> it's very useful for my current work TAJO-540 cuz it can randomly select
> nodes for tajo scheduler.
> One of the best candidates is Netflix Curator.
> http://curator.apache.org/curator-x-discovery/index.html
> I'd like to introduce Xuhui to help us with this issue. Xuhui was my
> colleague at Alibaba group. He was active in Hive community, one of his job
> is adding multi-distinct aggregation feature to Hive. Currently, he is a
> researcher work at Microsoft.
> [~mafish]
> Below is comment on this issue from Xuhui
> To my understanding, this feature is for high availability as well as high
> scalability. We don't need to provide all machine info for every service when
> tajo starts. Instead, we can dynamically register services to service
> discovery if necessary. Also, if some machine fails, it can be easily
> detected and replaced.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)