Cool. It extends Kylin scenario into Real Time Query.

With Warm regards

Billy Liu

ShaoFeng Shi <shaofeng...@apache.org> 于2018年11月2日周五 下午7:17写道:
>
> Hi Gang, I appreciate your hard work!
>
> Ma Gang <mg4w...@163.com> 于2018年11月1日周四 下午3:29写道:
>
> > Hi ShaoFeng,
> > For streaming ingest/query performance, there is a doc:
> > https://drive.google.com/file/d/1GSBMpRuVQRmr8Ev2BWvssfMd-Rck9vsH/view?ths=true
> > , it is also in the design doc's 'performance' section attached in the
> > jira: https://issues.apache.org/jira/browse/KYLIN-3654
> > For stability, it is very stable in our environment, but currently it is
> > not widely used in eBay, so it is hard to say.
> > I will start to merge code to master branch, it may take some time because
> > our current version is Kylin 2.1.0, hope it can be done before Nov.30, but
> > I cannot guarantee it, there is lots of other works to do.
> >
> > At 2018-11-01 15:08:12, "ShaoFeng Shi" <shaofeng...@apache.org> wrote:
> > >Hi Gang,
> > >
> > >Thank you for the information, that is helpful for understanding the
> > >overall design and implementation.
> > >
> > >Do you have some statistical information, like performance, throughput,
> > >stability, etc.? Besides, what's the plan of contributing it to the
> > >community? Thanks!
> > >
> > >
> > >Ma Gang <mg4w...@163.com> 于2018年11月1日周四 下午2:45写道:
> > >
> > >> Thanks Xiaoxiang,
> > >> Very good questions! Please see my comments started with [Gang]:
> > >>
> > >>
> > >> 1.      Is it possible to use Yarn as cluster manager for index task.
> > >> Coordinator process will set up them at specificed period.
> > >> [Gang] I think it is possible, but in current design,  the indexing task
> > >> is designed as long running task, it also can provide query service,
> > this
> > >> makes the whole system very simple and efficiency, I don't think we
> > need to
> > >> stop/start indexing task time by time. But use yarn to manage the
> > resource
> > >> is possible, we need to redesign the existing coordinator, to make it
> > easy
> > >> to deploy to Yarn, Kubernetes, etc. Hope this can be done after
> > >> contribution to community.
> > >>
> > >> 2.      As I know, ebay’s New Kylin Streaming Solution use replica Set
> > to
> > >> ensure that income messages wouldn’t lost if some processes  lost. I
> > think
> > >> replica set is a set of kafka cosumer processes which is responsible for
> > >> ingest message and build base cuboid in memory. Could you please show me
> > >> some detail about how replica Set provide HA guarantee? How to configure
> > >> it? A link / paper is OK.  I found one but I don’t know if it same
> > meaning
> > >> for your replica Set.
> > >>
> > >>
> > >> [Gang] Yes, it is similar as the MongoDB replication, but currently we
> > >> don't replicate data from Primary node, just assign the same Kafka
> > >> topic/partitions to the receivers in a ReplicaSet, all receivers in a
> > >> ReplicaSet will consume data from Kafka, so if one receiver is down,
> > other
> > >> receivers in the ReplicaSet are still consuming the same Kafka data, so
> > the
> > >> consume/query will not be impact. And We don't guarantee that the
> > receivers
> > >> in a ReplicaSet have the same consuming rate, but we can guarantee that
> > the
> > >> user can view data consistently by stick to the query to one receiver
> > for
> > >> one cube.
> > >> The HA implementation is a little bit naive, but simple and worked.
> > Maybe
> > >> in the future, we can do HA by replication to support other streaming
> > >> sources that don't support multiple consumers and don't have persistent
> > >> store.
> > >>
> > >> 3.      How to add or remove node of replica Set in production env? How
> > to
> > >> monitor the health/pressure of replica Set cluster ?
> > >> [Gang] Currently we have UI/restful api to let admin to add/remove node
> > >> to/from a ReplicaSet, and have a simple ui to let admin monitor the
> > health,
> > >> consuming rate for each receiver/cube. Also all metrics are collected
> > using
> > >> yammer metrics framework, it is easy to exposed to other monitor system.
> > >>
> > >> 4.      Does all measure are supported in ebay’s New Kylin Streaming
> > >> Solution? What about count distinct(bitmap)?
> > >> [Gang] Most measures are supported, but precise count distinct(bitmap)
> > is
> > >> not support in case that the distinct dimension is not int type. As you
> > >> know, to support precise count distinct for not-int type dimension, it
> > >> needs to build global dictionary, it is not possible in the streaming
> > env.
> > >>
> > >>
> > >> 5.      It seems ebay’s New Kylin Streaming Solution use a custom
> > columnar
> > >> storage, why not use a open source mature columnar storage  solution ?
> > Have
> > >> your ever compare the performance of your custom columnar storage to
> > open
> > >> source columnar storage  solution ?
> > >>
> > >> [Gang] Most open source columnar format like Parquet, ORC are designed
> > to
> > >> use in Hadoop env, the streaming data are in local disk, so I didn't
> > >> consider them at the beginning. It is not very hard to define columnar
> > >> format to store Kylin specific data, use a customize columnar storage,
> > you
> > >> can use mmap file to scan data, add row-level invert index for all
> > >> dimensions, so I think the performance will be better compared to using
> > >> common columnar format. I didn't compare the performance, but the
> > storage
> > >> engine is pluggable, you may contribute a parquet storage if you are
> > >> interesting.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> At 2018-11-01 12:42:25, "Xiaoxiang Yu" <xiaoxiang...@kyligence.io>
> > wrote:
> > >> >Hi gang, I am so glad to know that eBay has a solution for realtime
> > olap
> > >> on kylin. I have some small question:
> > >> >
> > >> >
> > >> >1.      Is it possible to use Yarn as cluster manager for index task.
> > >> Coordinator process will set up them at specificed period. Yarn will
> > manage
> > >> :
> > >> >
> > >> >a)       retry these task if some failed
> > >> >
> > >> >b)       resource allocation
> > >> >
> > >> >c)       log collection
> > >> >
> > >> >2.      As I know, ebay’s New Kylin Streaming Solution use replica Set
> > to
> > >> ensure that income messages wouldn’t lost if some processes  lost. I
> > think
> > >> replica set is a set of kafka cosumer processes which is responsible for
> > >> ingest message and build base cuboid in memory. Could you please show me
> > >> some detail about how replica Set provide HA guarantee? How to configure
> > >> it? A link / paper is OK.  I found one but I don’t know if it same
> > meaning
> > >> for your replica Set.
> > >> >
> > >> >a)       [Mongodb replication](
> > >> https://docs.mongodb.com/manual/replication/).
> > >> >
> > >> >3.      How to add or remove node of replica Set in production env? How
> > >> to monitor the health/pressure of replica Set cluster ?
> > >> >
> > >> >4.      Does all measure are supported in ebay’s New Kylin Streaming
> > >> Solution? What about count distinct(bitmap)?
> > >> >
> > >> >5.      It seems ebay’s New Kylin Streaming Solution use a custom
> > >> columnar storage, why not use a open source mature columnar storage
> > >> solution ? Have your ever compare the performance of your custom
> > columnar
> > >> storage to open source columnar storage  solution ?
> > >> >
> > >> >
> > >> >
> > >> >----------------
> > >> >Best wishes,
> > >> >Xiaoxiang Yu
> > >> >
> > >> >
> > >> >发件人: Ma Gang <mg4w...@163.com>
> > >> >答复: "dev@kylin.apache.org" <dev@kylin.apache.org>
> > >> >日期: 2018年10月30日 星期二 15:24
> > >> >收件人: "dev@kylin.apache.org" <dev@kylin.apache.org>
> > >> >主题: [DISCUSS] New Kylin Streaming Solution From eBay
> > >> >
> > >> >Hi all,
> > >> >
> > >> >eBay Kylin team has developed a new Kylin streaming solution, the basic
> > >> idea is to build a streaming cluster to ingest data from streaming
> > >> source(Kafka), and provide query for real-time data, the data
> > preparation
> > >> latency is milliseconds, which means the data is queryable almost when
> > it
> > >> is ingested, attach is the architecture design doc.
> > >> >We would like to contribute the feature to community, please let us
> > know
> > >> if you have any concern.
> > >> >
> > >> >Thanks,
> > >> >Gang(Allen) Ma
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> > >
> > >
> > >--
> > >Best regards,
> > >
> > >Shaofeng Shi 史少锋
> >
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋

Reply via email to