The default branch is for 4.X which is a maintained branch, the active branch is kylin5. I will change the default branch to kylin5 later.
------------------------ With warm regard Xiaoxiang Yu On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > Hi Xiaoxiang, Sirs / Madams > > Can you see the atttached photo > > My boss asked that why druid commit code regularly but kylin had not been > committed since July > > > On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org> wrote: > >> I think so. >> >> Response time is not the only factor to make a decision. Kylin could be >> cheaper >> when the query pattern is suitable for the Kylin model, and Kylin can >> guarantee >> reasonable query latency. Clickhouse will be quicker in an ad hoc query >> scenario. >> >> By the way, Youzan and Kyligence combine them together to provide >> unified data analytics services for their customers. >> >> ------------------------ >> With warm regard >> Xiaoxiang Yu >> >> >> >> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: >> >>> Hi Xiaoxiang, thank you >>> >>> In case my client uses cloud computing service like gcp or aws, which >>> will cost more: precalculation feature of kylin or clickhouse (incase of >>> kylin, I have a thought that the query execution has been done once and >>> stored in cube to be used many times so kylin uses less cloud >>> computation, >>> is that true)? >>> >>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org> wrote: >>> >>> > Following text is part of an article( >>> > https://zhuanlan.zhihu.com/p/343394287) . >>> > >>> > >>> > >>> =============================================================================== >>> > >>> > Kylin is suitable for aggregation queries with fixed modes because of >>> its >>> > pre-calculated technology, for example, join, group by, and where >>> condition >>> > modes in SQL are relatively fixed, etc. The larger the data volume is, >>> the >>> > more obvious the advantages of using Kylin are; in particular, Kylin is >>> > particularly advantageous in the scenarios of de-emphasis (count >>> distinct), >>> > Top N, and Percentile. In particular, Kylin's advantages in >>> de-weighting >>> > (count distinct), Top N, Percentile and other scenarios are especially >>> > huge, and it is used in a large number of scenarios, such as >>> Dashboard, all >>> > kinds of reports, large-screen display, traffic statistics, and user >>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to >>> build >>> > their data service platforms, providing millions to tens of millions of >>> > queries per day, and most of the queries can be completed within 2 - 3 >>> > seconds. There is no better alternative for such a high concurrency >>> > scenario. >>> > >>> > ClickHouse, because of its MPP architecture, has high computing power >>> and >>> > is more suitable when the query request is more flexible, or when >>> there is >>> > a need for detailed queries with low concurrency. Scenarios include: >>> very >>> > many columns and where conditions are arbitrarily combined with the >>> user >>> > label filtering, not a large amount of concurrency of complex >>> on-the-spot >>> > query and so on. If the amount of data and access is large, you need to >>> > deploy a distributed ClickHouse cluster, which is a higher challenge >>> for >>> > operation and maintenance. >>> > >>> > If some queries are very flexible but infrequent, it is more >>> > resource-efficient to use now-computing. Since the number of queries is >>> > small, even if each query consumes a lot of computational resources, >>> it is >>> > still cost-effective overall. If some queries have a fixed pattern and >>> the >>> > query volume is large, it is more suitable for Kylin, because the query >>> > volume is large, and by using large computational resources to save the >>> > results, the upfront computational cost can be amortized over each >>> query, >>> > so it is the most economical. >>> > >>> > --- Translated with DeepL.com (free version) >>> > >>> > >>> > ------------------------ >>> > With warm regard >>> > Xiaoxiang Yu >>> > >>> > >>> > >>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>> wrote: >>> > >>> >> Thank you Xiaoxiang for the near real time streaming feature. That's >>> >> great. >>> >> >>> >> This morning there has been a new challenge to my team: clickhouse >>> offered >>> >> us the speed of calculating 8 billion rows in millisecond which is >>> faster >>> >> than my demonstration (I used Kylin to do calculating 1 billion rows >>> in >>> >> 2.9 >>> >> seconds) >>> >> >>> >> Can you briefly suggest the advantages of kylin over clickhouse so >>> that I >>> >> can defend my demonstration. >>> >> >>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <x...@apache.org> wrote: >>> >> >>> >> > 1. "In this important scenario of realtime analytics, the reason >>> here is >>> >> > that >>> >> > kylin has lag time due to model update of new segment build, is that >>> >> > correct?" >>> >> > >>> >> > You are correct. >>> >> > >>> >> > 2. "If that is true, then can you suggest a work-around of >>> combination >>> >> of >>> >> > ... " >>> >> > >>> >> > Kylin is planning to introduce NRT streaming(coding is completed >>> but not >>> >> > released), >>> >> > which can make the time-lag to about 3 minutes(that is my estimation >>> >> but I >>> >> > am >>> >> > quite certain about it). >>> >> > NRT stands for 'near real-time', it will run a job and do >>> micro-batch >>> >> > aggregation and persistence periodically. The price is that you >>> need to >>> >> run >>> >> > and monitor a long-running >>> >> > job. This feature is based on Spark Streaming, so you need >>> knowledge of >>> >> > it. >>> >> > >>> >> > I am curious about what is the maximum time-lag your customers >>> >> > can tolerate? >>> >> > Personally, I guess minute level time-lag is ok for most cases. >>> >> > >>> >> > ------------------------ >>> >> > With warm regard >>> >> > Xiaoxiang Yu >>> >> > >>> >> > >>> >> > >>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>> >> wrote: >>> >> > >>> >> > > Druid is better in >>> >> > > - Have a real-time datasource like Kafka etc. >>> >> > > >>> >> > > ========================== >>> >> > > >>> >> > > Hi Xiaoxiang, thank you for your response. >>> >> > > >>> >> > > In this important scenario of realtime alalytics, the reason here >>> is >>> >> that >>> >> > > kylin has lag time due to model update of new segment build, is >>> that >>> >> > > correct? >>> >> > > >>> >> > > If that is true, then can you suggest a work-around of >>> combination of >>> >> : >>> >> > > >>> >> > > (time - lag kylin cube) + (realtime DB update) to provide >>> >> > > realtime capability ? >>> >> > > >>> >> > > IMO, the point here is to find that (realtime DB update) and >>> >> integrate it >>> >> > > with (time - lag kylin cube). >>> >> > > >>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <x...@apache.org> >>> wrote: >>> >> > > >>> >> > > > I researched and tested Druid two years ago(I don't know too >>> much >>> >> about >>> >> > > > the change of Druid in these two years. New features that I >>> know >>> >> are : >>> >> > > > new UI, fully on K8s etc). >>> >> > > > >>> >> > > > Here are some cases you should consider using Druid other than >>> Kylin >>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I >>> >> used >>> >> > two >>> >> > > > years ago): >>> >> > > > >>> >> > > > - Have a real-time datasource like Kafka etc. >>> >> > > > - Most queries are small(Based on my test result, I think Druid >>> had >>> >> > > better >>> >> > > > response time for small queries two years ago.) >>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the >>> >> K8S/public >>> >> > > > cloud platform as your deployment platform. >>> >> > > > >>> >> > > > But I do think there are many scenarios in which Kylin could be >>> >> better, >>> >> > > > like: >>> >> > > > >>> >> > > > - Better performance for complex/big queries. Kylin can have a >>> more >>> >> > > > exact-match/fine-grained >>> >> > > > Index for queries containing different `Group By dimensions`. >>> >> > > > - User-friendly UI for modeling. >>> >> > > > - Support 'Join' better? (Not sure at the moment) >>> >> > > > - ODBC driver for different BI.(its website did not show it >>> supports >>> >> > ODBC >>> >> > > > well) >>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid. >>> >> > > > >>> >> > > > >>> >> > > > I don't know Pinot, so I have nothing to say about it. >>> >> > > > Hope to help you, or you are free to share your opinion. >>> >> > > > >>> >> > > > ------------------------ >>> >> > > > With warm regard >>> >> > > > Xiaoxiang Yu >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy >>> <na...@vnpay.vn.invalid> >>> >> > > wrote: >>> >> > > > >>> >> > > >> Dear Xiaoxiang, >>> >> > > >> Sirs/Madams, >>> >> > > >> >>> >> > > >> May I post my boss's question: >>> >> > > >> >>> >> > > >> What are the pros and cons of the OLAP platform Kylin compared >>> to >>> >> > Pinot >>> >> > > >> and >>> >> > > >> Druid? >>> >> > > >> >>> >> > > >> Please kindly let me know >>> >> > > >> >>> >> > > >> Thank you very much and best regards >>> >> > > >> >>> >> > > > >>> >> > > >>> >> > >>> >> >>> > >>> >>