Druid quick comparision

Nam Đỗ Duy Fri, 08 Dec 2023 16:02:00 -0800

Thank you Xiaoxiang for your reply

————————————-
Do you have any suggestions/wishes for kylin 5(except real-time feature)?
————————————-
Yes: please answer to help me clear this headache:


1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
not then do we have any work around?

2. My team is using kerberos for authentication, do you have any
document/casestudy about integrating kerberos with kylin 4.x and kylin 5.x

3. Should we use apache ranger instead of kerberos for authentication and
for security purposes?

Thank you again

On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <[email protected]> wrote:

> I guess the release date should be 2024/01 .
> Do you have any suggestions/wishes for kylin 5(except real-time feature)?
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <[email protected]> wrote:
>
>> Thank you very much xiaoxiang, I did the presentation this morning already
>> so there is no time for you to comment. Next time I will send you in
>> advance. The meeting result was that we will implement both druid and
>> kylin
>> in the next couple of projects because of its realtime feature. Hope that
>> kylin will have same feature soon.
>>
>> May I ask when will you release kylin 5.0?
>>
>> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <[email protected]> wrote:
>>
>> > Since 2018 there are a lot of new features and code refactor.
>> > If you like, you can share your ppt to me privately, maybe I can
>> > give some comments.
>> >
>> > Here is the reference of advantages of Kylin since 2018:
>> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> > -
>> >
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> >
>> > ------------------------
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <[email protected]>
>> wrote:
>> >
>> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
>> Druid in
>> >> my team.
>> >>
>> >> I found this article and would like you to update me the advantages of
>> >> Kylin since 2018 until now (especially with version 5 to be released)
>> >>
>> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
>> >> <
>> >>
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >> >
>> >>
>> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <[email protected]> wrote:
>> >>
>> >> > Thank you very much for your prompt response, I still have several
>> >> > questions to seek for your help later.
>> >> >
>> >> > Best regards and have a good day
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <[email protected]> wrote:
>> >> >
>> >> >> Done. Github branch changed to kylin5.
>> >> >>
>> >> >> ------------------------
>> >> >> With warm regard
>> >> >> Xiaoxiang Yu
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <[email protected]>
>> wrote:
>> >> >>
>> >> >> > A JIRA ticket has been opened, waiting for INFRA :
>> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> >> >> > ------------------------
>> >> >> > With warm regard
>> >> >> > Xiaoxiang Yu
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <[email protected]
>> >
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thank you Xiaoxiang, please update me when you have changed your
>> >> >> default
>> >> >> >> branch. In case people are impressed by the numbers then I hope
>> to
>> >> turn
>> >> >> >> this situation to reverse direction.
>> >> >> >>
>> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <[email protected]>
>> >> wrote:
>> >> >> >>
>> >> >> >>> The default branch is for 4.X which is a maintained branch, the
>> >> active
>> >> >> >>> branch is kylin5.
>> >> >> >>> I will change the default branch to kylin5 later.
>> >> >> >>>
>> >> >> >>> ------------------------
>> >> >> >>> With warm regard
>> >> >> >>> Xiaoxiang Yu
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
>> <[email protected]>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
>> >> >> >>>>
>> >> >> >>>> Can you see the atttached photo
>> >> >> >>>>
>> >> >> >>>> My boss asked that why druid commit code regularly but kylin
>> had
>> >> not
>> >> >> >>>> been committed since July
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <[email protected]>
>> wrote:
>> >> >> >>>>
>> >> >> >>>>> I think so.
>> >> >> >>>>>
>> >> >> >>>>> Response time is not the only factor to make a decision. Kylin
>> >> could
>> >> >> >>>>> be cheaper
>> >> >> >>>>> when the query pattern is suitable for the Kylin model, and
>> Kylin
>> >> >> can
>> >> >> >>>>> guarantee
>> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad
>> hoc
>> >> >> >>>>> query scenario.
>> >> >> >>>>>
>> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
>> provide
>> >> >> >>>>> unified data analytics services for their customers.
>> >> >> >>>>>
>> >> >> >>>>> ------------------------
>> >> >> >>>>> With warm regard
>> >> >> >>>>> Xiaoxiang Yu
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
>> <[email protected]
>> >> >
>> >> >> >>>>> wrote:
>> >> >> >>>>>
>> >> >> >>>>>> Hi Xiaoxiang, thank you
>> >> >> >>>>>>
>> >> >> >>>>>> In case my client uses cloud computing service like gcp or
>> aws,
>> >> >> which
>> >> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse
>> >> >> (incase
>> >> >> >>>>>> of
>> >> >> >>>>>> kylin, I have a thought that the query execution has been
>> done
>> >> once
>> >> >> >>>>>> and
>> >> >> >>>>>> stored in cube to be used many times so kylin uses less cloud
>> >> >> >>>>>> computation,
>> >> >> >>>>>> is that true)?
>> >> >> >>>>>>
>> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <[email protected]
>> >
>> >> >> wrote:
>> >> >> >>>>>>
>> >> >> >>>>>> > Following text is part of an article(
>> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>>
>> >> >>
>> >>
>> ===============================================================================
>> >> >> >>>>>> >
>> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
>> >> >> because
>> >> >> >>>>>> of its
>> >> >> >>>>>> > pre-calculated technology, for example, join, group by, and
>> >> where
>> >> >> >>>>>> condition
>> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
>> >> >> volume
>> >> >> >>>>>> is, the
>> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
>> particular,
>> >> >> >>>>>> Kylin is
>> >> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
>> >> (count
>> >> >> >>>>>> distinct),
>> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>> >> >> >>>>>> de-weighting
>> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
>> >> >> >>>>>> especially
>> >> >> >>>>>> > huge, and it is used in a large number of scenarios, such
>> as
>> >> >> >>>>>> Dashboard, all
>> >> >> >>>>>> > kinds of reports, large-screen display, traffic statistics,
>> >> and
>> >> >> user
>> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use
>> >> Kylin
>> >> >> >>>>>> to build
>> >> >> >>>>>> > their data service platforms, providing millions to tens of
>> >> >> >>>>>> millions of
>> >> >> >>>>>> > queries per day, and most of the queries can be completed
>> >> within
>> >> >> 2
>> >> >> >>>>>> - 3
>> >> >> >>>>>> > seconds. There is no better alternative for such a high
>> >> >> concurrency
>> >> >> >>>>>> > scenario.
>> >> >> >>>>>> >
>> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
>> >> computing
>> >> >> >>>>>> power and
>> >> >> >>>>>> > is more suitable when the query request is more flexible,
>> or
>> >> when
>> >> >> >>>>>> there is
>> >> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios
>> >> >> >>>>>> include: very
>> >> >> >>>>>> > many columns and where conditions are arbitrarily combined
>> >> with
>> >> >> the
>> >> >> >>>>>> user
>> >> >> >>>>>> > label filtering, not a large amount of concurrency of
>> complex
>> >> >> >>>>>> on-the-spot
>> >> >> >>>>>> > query and so on. If the amount of data and access is large,
>> >> you
>> >> >> >>>>>> need to
>> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
>> >> >> >>>>>> challenge for
>> >> >> >>>>>> > operation and maintenance.
>> >> >> >>>>>> >
>> >> >> >>>>>> > If some queries are very flexible but infrequent, it is
>> more
>> >> >> >>>>>> > resource-efficient to use now-computing. Since the number
>> of
>> >> >> >>>>>> queries is
>> >> >> >>>>>> > small, even if each query consumes a lot of computational
>> >> >> >>>>>> resources, it is
>> >> >> >>>>>> > still cost-effective overall. If some queries have a fixed
>> >> >> pattern
>> >> >> >>>>>> and the
>> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
>> because
>> >> the
>> >> >> >>>>>> query
>> >> >> >>>>>> > volume is large, and by using large computational
>> resources to
>> >> >> save
>> >> >> >>>>>> the
>> >> >> >>>>>> > results, the upfront computational cost can be amortized
>> over
>> >> >> each
>> >> >> >>>>>> query,
>> >> >> >>>>>> > so it is the most economical.
>> >> >> >>>>>> >
>> >> >> >>>>>> > --- Translated with DeepL.com (free version)
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> > ------------------------
>> >> >> >>>>>> > With warm regard
>> >> >> >>>>>> > Xiaoxiang Yu
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
>> >> <[email protected]
>> >> >> >
>> >> >> >>>>>> wrote:
>> >> >> >>>>>> >
>> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
>> feature.
>> >> >> >>>>>> That's
>> >> >> >>>>>> >> great.
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> This morning there has been a new challenge to my team:
>> >> >> clickhouse
>> >> >> >>>>>> offered
>> >> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond
>> >> which
>> >> >> is
>> >> >> >>>>>> faster
>> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
>> >> billion
>> >> >> >>>>>> rows in
>> >> >> >>>>>> >> 2.9
>> >> >> >>>>>> >> seconds)
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
>> >> clickhouse
>> >> >> so
>> >> >> >>>>>> that I
>> >> >> >>>>>> >> can defend my demonstration.
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
>> [email protected]
>> >> >
>> >> >> >>>>>> wrote:
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics,
>> the
>> >> >> reason
>> >> >> >>>>>> here is
>> >> >> >>>>>> >> > that
>> >> >> >>>>>> >> > kylin has lag time due to model update of new segment
>> >> build,
>> >> >> is
>> >> >> >>>>>> that
>> >> >> >>>>>> >> > correct?"
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > You are correct.
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around
>> of
>> >> >> >>>>>> combination
>> >> >> >>>>>> >> of
>> >> >> >>>>>> >> > ... "
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
>> >> >> completed
>> >> >> >>>>>> but not
>> >> >> >>>>>> >> > released),
>> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is
>> my
>> >> >> >>>>>> estimation
>> >> >> >>>>>> >> but I
>> >> >> >>>>>> >> > am
>> >> >> >>>>>> >> > quite certain about it).
>> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and
>> do
>> >> >> >>>>>> micro-batch
>> >> >> >>>>>> >> > aggregation and persistence periodically. The price is
>> that
>> >> >> you
>> >> >> >>>>>> need to
>> >> >> >>>>>> >> run
>> >> >> >>>>>> >> > and monitor a long-running
>> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you
>> need
>> >> >> >>>>>> knowledge of
>> >> >> >>>>>> >> > it.
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
>> >> customers
>> >> >> >>>>>> >> > can tolerate?
>> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
>> >> >> cases.
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > ------------------------
>> >> >> >>>>>> >> > With warm regard
>> >> >> >>>>>> >> > Xiaoxiang Yu
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> >> >> >>>>>> <[email protected]>
>> >> >> >>>>>> >> wrote:
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > > Druid is better in
>> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > ==========================
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > In this important scenario of realtime alalytics, the
>> >> reason
>> >> >> >>>>>> here is
>> >> >> >>>>>> >> that
>> >> >> >>>>>> >> > > kylin has lag time due to model update of new segment
>> >> build,
>> >> >> >>>>>> is that
>> >> >> >>>>>> >> > > correct?
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > If that is true, then can you suggest a work-around of
>> >> >> >>>>>> combination of
>> >> >> >>>>>> >> :
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
>> provide
>> >> >> >>>>>> >> > > realtime capability ?
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
>> update)
>> >> and
>> >> >> >>>>>> >> integrate it
>> >> >> >>>>>> >> > > with (time - lag kylin cube).
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> >> >> [email protected]>
>> >> >> >>>>>> wrote:
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't
>> >> know
>> >> >> too
>> >> >> >>>>>> much
>> >> >> >>>>>> >> about
>> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
>> features
>> >> >> that I
>> >> >> >>>>>> know
>> >> >> >>>>>> >> are :
>> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > Here are some cases you should consider using Druid
>> >> other
>> >> >> >>>>>> than Kylin
>> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the
>> >> Druid
>> >> >> >>>>>> which I
>> >> >> >>>>>> >> used
>> >> >> >>>>>> >> > two
>> >> >> >>>>>> >> > > > years ago):
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>> >> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I
>> >> think
>> >> >> >>>>>> Druid had
>> >> >> >>>>>> >> > > better
>> >> >> >>>>>> >> > > > response time for small queries two years ago.)
>> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to
>> use
>> >> the
>> >> >> >>>>>> >> K8S/public
>> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
>> Kylin
>> >> >> could
>> >> >> >>>>>> be
>> >> >> >>>>>> >> better,
>> >> >> >>>>>> >> > > > like:
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin
>> can
>> >> >> have
>> >> >> >>>>>> a more
>> >> >> >>>>>> >> > > > exact-match/fine-grained
>> >> >> >>>>>> >> > > >   Index for queries containing different `Group By
>> >> >> >>>>>> dimensions`.
>> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
>> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not
>> >> show
>> >> >> it
>> >> >> >>>>>> supports
>> >> >> >>>>>> >> > ODBC
>> >> >> >>>>>> >> > > > well)
>> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
>> Druid.
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about
>> it.
>> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
>> >> opinion.
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > ------------------------
>> >> >> >>>>>> >> > > > With warm regard
>> >> >> >>>>>> >> > > > Xiaoxiang Yu
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> >> >> >>>>>> <[email protected]>
>> >> >> >>>>>> >> > > wrote:
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
>> >> >> >>>>>> >> > > >> Sirs/Madams,
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> May I post my boss's question:
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
>> Kylin
>> >> >> >>>>>> compared to
>> >> >> >>>>>> >> > Pinot
>> >> >> >>>>>> >> > > >> and
>> >> >> >>>>>> >> > > >> Druid?
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> Please kindly let me know
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> Thank you very much and best regards
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >>
>> >> >> >>>>>> >
>> >> >> >>>>>>
>> >> >> >>>>>
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Reply via email to