Thank you very much xiaoxiang, I did the presentation this morning already so there is no time for you to comment. Next time I will send you in advance. The meeting result was that we will implement both druid and kylin in the next couple of projects because of its realtime feature. Hope that kylin will have same feature soon.
May I ask when will you release kylin 5.0? On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <x...@apache.org> wrote: > Since 2018 there are a lot of new features and code refactor. > If you like, you can share your ppt to me privately, maybe I can > give some comments. > > Here is the reference of advantages of Kylin since 2018: > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/ > - > https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/ > - https://kylin.apache.org/5.0/docs/development/roadmap > > ------------------------ > With warm regard > Xiaoxiang Yu > > > > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in >> my team. >> >> I found this article and would like you to update me the advantages of >> Kylin since 2018 until now (especially with version 5 to be released) >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)? >> < >> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/ >> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote: >> >> > Thank you very much for your prompt response, I still have several >> > questions to seek for your help later. >> > >> > Best regards and have a good day >> > >> > >> > >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <x...@apache.org> wrote: >> > >> >> Done. Github branch changed to kylin5. >> >> >> >> ------------------------ >> >> With warm regard >> >> Xiaoxiang Yu >> >> >> >> >> >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <x...@apache.org> wrote: >> >> >> >> > A JIRA ticket has been opened, waiting for INFRA : >> >> > https://issues.apache.org/jira/browse/INFRA-25238 . >> >> > ------------------------ >> >> > With warm regard >> >> > Xiaoxiang Yu >> >> > >> >> > >> >> > >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> >> >> wrote: >> >> > >> >> >> Thank you Xiaoxiang, please update me when you have changed your >> >> default >> >> >> branch. In case people are impressed by the numbers then I hope to >> turn >> >> >> this situation to reverse direction. >> >> >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <x...@apache.org> >> wrote: >> >> >> >> >> >>> The default branch is for 4.X which is a maintained branch, the >> active >> >> >>> branch is kylin5. >> >> >>> I will change the default branch to kylin5 later. >> >> >>> >> >> >>> ------------------------ >> >> >>> With warm regard >> >> >>> Xiaoxiang Yu >> >> >>> >> >> >>> >> >> >>> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> >> >> >>> wrote: >> >> >>> >> >> >>>> Hi Xiaoxiang, Sirs / Madams >> >> >>>> >> >> >>>> Can you see the atttached photo >> >> >>>> >> >> >>>> My boss asked that why druid commit code regularly but kylin had >> not >> >> >>>> been committed since July >> >> >>>> >> >> >>>> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org> wrote: >> >> >>>> >> >> >>>>> I think so. >> >> >>>>> >> >> >>>>> Response time is not the only factor to make a decision. Kylin >> could >> >> >>>>> be cheaper >> >> >>>>> when the query pattern is suitable for the Kylin model, and Kylin >> >> can >> >> >>>>> guarantee >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc >> >> >>>>> query scenario. >> >> >>>>> >> >> >>>>> By the way, Youzan and Kyligence combine them together to provide >> >> >>>>> unified data analytics services for their customers. >> >> >>>>> >> >> >>>>> ------------------------ >> >> >>>>> With warm regard >> >> >>>>> Xiaoxiang Yu >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid >> > >> >> >>>>> wrote: >> >> >>>>> >> >> >>>>>> Hi Xiaoxiang, thank you >> >> >>>>>> >> >> >>>>>> In case my client uses cloud computing service like gcp or aws, >> >> which >> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse >> >> (incase >> >> >>>>>> of >> >> >>>>>> kylin, I have a thought that the query execution has been done >> once >> >> >>>>>> and >> >> >>>>>> stored in cube to be used many times so kylin uses less cloud >> >> >>>>>> computation, >> >> >>>>>> is that true)? >> >> >>>>>> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org> >> >> wrote: >> >> >>>>>> >> >> >>>>>> > Following text is part of an article( >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) . >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> >> >> >> =============================================================================== >> >> >>>>>> > >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes >> >> because >> >> >>>>>> of its >> >> >>>>>> > pre-calculated technology, for example, join, group by, and >> where >> >> >>>>>> condition >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data >> >> volume >> >> >>>>>> is, the >> >> >>>>>> > more obvious the advantages of using Kylin are; in particular, >> >> >>>>>> Kylin is >> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis >> (count >> >> >>>>>> distinct), >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in >> >> >>>>>> de-weighting >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are >> >> >>>>>> especially >> >> >>>>>> > huge, and it is used in a large number of scenarios, such as >> >> >>>>>> Dashboard, all >> >> >>>>>> > kinds of reports, large-screen display, traffic statistics, >> and >> >> user >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use >> Kylin >> >> >>>>>> to build >> >> >>>>>> > their data service platforms, providing millions to tens of >> >> >>>>>> millions of >> >> >>>>>> > queries per day, and most of the queries can be completed >> within >> >> 2 >> >> >>>>>> - 3 >> >> >>>>>> > seconds. There is no better alternative for such a high >> >> concurrency >> >> >>>>>> > scenario. >> >> >>>>>> > >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high >> computing >> >> >>>>>> power and >> >> >>>>>> > is more suitable when the query request is more flexible, or >> when >> >> >>>>>> there is >> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios >> >> >>>>>> include: very >> >> >>>>>> > many columns and where conditions are arbitrarily combined >> with >> >> the >> >> >>>>>> user >> >> >>>>>> > label filtering, not a large amount of concurrency of complex >> >> >>>>>> on-the-spot >> >> >>>>>> > query and so on. If the amount of data and access is large, >> you >> >> >>>>>> need to >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher >> >> >>>>>> challenge for >> >> >>>>>> > operation and maintenance. >> >> >>>>>> > >> >> >>>>>> > If some queries are very flexible but infrequent, it is more >> >> >>>>>> > resource-efficient to use now-computing. Since the number of >> >> >>>>>> queries is >> >> >>>>>> > small, even if each query consumes a lot of computational >> >> >>>>>> resources, it is >> >> >>>>>> > still cost-effective overall. If some queries have a fixed >> >> pattern >> >> >>>>>> and the >> >> >>>>>> > query volume is large, it is more suitable for Kylin, because >> the >> >> >>>>>> query >> >> >>>>>> > volume is large, and by using large computational resources to >> >> save >> >> >>>>>> the >> >> >>>>>> > results, the upfront computational cost can be amortized over >> >> each >> >> >>>>>> query, >> >> >>>>>> > so it is the most economical. >> >> >>>>>> > >> >> >>>>>> > --- Translated with DeepL.com (free version) >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > ------------------------ >> >> >>>>>> > With warm regard >> >> >>>>>> > Xiaoxiang Yu >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy >> <na...@vnpay.vn.invalid >> >> > >> >> >>>>>> wrote: >> >> >>>>>> > >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature. >> >> >>>>>> That's >> >> >>>>>> >> great. >> >> >>>>>> >> >> >> >>>>>> >> This morning there has been a new challenge to my team: >> >> clickhouse >> >> >>>>>> offered >> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond >> which >> >> is >> >> >>>>>> faster >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1 >> billion >> >> >>>>>> rows in >> >> >>>>>> >> 2.9 >> >> >>>>>> >> seconds) >> >> >>>>>> >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over >> clickhouse >> >> so >> >> >>>>>> that I >> >> >>>>>> >> can defend my demonstration. >> >> >>>>>> >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <x...@apache.org >> > >> >> >>>>>> wrote: >> >> >>>>>> >> >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the >> >> reason >> >> >>>>>> here is >> >> >>>>>> >> > that >> >> >>>>>> >> > kylin has lag time due to model update of new segment >> build, >> >> is >> >> >>>>>> that >> >> >>>>>> >> > correct?" >> >> >>>>>> >> > >> >> >>>>>> >> > You are correct. >> >> >>>>>> >> > >> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of >> >> >>>>>> combination >> >> >>>>>> >> of >> >> >>>>>> >> > ... " >> >> >>>>>> >> > >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is >> >> completed >> >> >>>>>> but not >> >> >>>>>> >> > released), >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my >> >> >>>>>> estimation >> >> >>>>>> >> but I >> >> >>>>>> >> > am >> >> >>>>>> >> > quite certain about it). >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do >> >> >>>>>> micro-batch >> >> >>>>>> >> > aggregation and persistence periodically. The price is that >> >> you >> >> >>>>>> need to >> >> >>>>>> >> run >> >> >>>>>> >> > and monitor a long-running >> >> >>>>>> >> > job. This feature is based on Spark Streaming, so you need >> >> >>>>>> knowledge of >> >> >>>>>> >> > it. >> >> >>>>>> >> > >> >> >>>>>> >> > I am curious about what is the maximum time-lag your >> customers >> >> >>>>>> >> > can tolerate? >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most >> >> cases. >> >> >>>>>> >> > >> >> >>>>>> >> > ------------------------ >> >> >>>>>> >> > With warm regard >> >> >>>>>> >> > Xiaoxiang Yu >> >> >>>>>> >> > >> >> >>>>>> >> > >> >> >>>>>> >> > >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy >> >> >>>>>> <na...@vnpay.vn.invalid> >> >> >>>>>> >> wrote: >> >> >>>>>> >> > >> >> >>>>>> >> > > Druid is better in >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc. >> >> >>>>>> >> > > >> >> >>>>>> >> > > ========================== >> >> >>>>>> >> > > >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response. >> >> >>>>>> >> > > >> >> >>>>>> >> > > In this important scenario of realtime alalytics, the >> reason >> >> >>>>>> here is >> >> >>>>>> >> that >> >> >>>>>> >> > > kylin has lag time due to model update of new segment >> build, >> >> >>>>>> is that >> >> >>>>>> >> > > correct? >> >> >>>>>> >> > > >> >> >>>>>> >> > > If that is true, then can you suggest a work-around of >> >> >>>>>> combination of >> >> >>>>>> >> : >> >> >>>>>> >> > > >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide >> >> >>>>>> >> > > realtime capability ? >> >> >>>>>> >> > > >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB update) >> and >> >> >>>>>> >> integrate it >> >> >>>>>> >> > > with (time - lag kylin cube). >> >> >>>>>> >> > > >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu < >> >> x...@apache.org> >> >> >>>>>> wrote: >> >> >>>>>> >> > > >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't >> know >> >> too >> >> >>>>>> much >> >> >>>>>> >> about >> >> >>>>>> >> > > > the change of Druid in these two years. New features >> >> that I >> >> >>>>>> know >> >> >>>>>> >> are : >> >> >>>>>> >> > > > new UI, fully on K8s etc). >> >> >>>>>> >> > > > >> >> >>>>>> >> > > > Here are some cases you should consider using Druid >> other >> >> >>>>>> than Kylin >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the >> Druid >> >> >>>>>> which I >> >> >>>>>> >> used >> >> >>>>>> >> > two >> >> >>>>>> >> > > > years ago): >> >> >>>>>> >> > > > >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc. >> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I >> think >> >> >>>>>> Druid had >> >> >>>>>> >> > > better >> >> >>>>>> >> > > > response time for small queries two years ago.) >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use >> the >> >> >>>>>> >> K8S/public >> >> >>>>>> >> > > > cloud platform as your deployment platform. >> >> >>>>>> >> > > > >> >> >>>>>> >> > > > But I do think there are many scenarios in which Kylin >> >> could >> >> >>>>>> be >> >> >>>>>> >> better, >> >> >>>>>> >> > > > like: >> >> >>>>>> >> > > > >> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can >> >> have >> >> >>>>>> a more >> >> >>>>>> >> > > > exact-match/fine-grained >> >> >>>>>> >> > > > Index for queries containing different `Group By >> >> >>>>>> dimensions`. >> >> >>>>>> >> > > > - User-friendly UI for modeling. >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment) >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not >> show >> >> it >> >> >>>>>> supports >> >> >>>>>> >> > ODBC >> >> >>>>>> >> > > > well) >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid. >> >> >>>>>> >> > > > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it. >> >> >>>>>> >> > > > Hope to help you, or you are free to share your >> opinion. >> >> >>>>>> >> > > > >> >> >>>>>> >> > > > ------------------------ >> >> >>>>>> >> > > > With warm regard >> >> >>>>>> >> > > > Xiaoxiang Yu >> >> >>>>>> >> > > > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy >> >> >>>>>> <na...@vnpay.vn.invalid> >> >> >>>>>> >> > > wrote: >> >> >>>>>> >> > > > >> >> >>>>>> >> > > >> Dear Xiaoxiang, >> >> >>>>>> >> > > >> Sirs/Madams, >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > >> May I post my boss's question: >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin >> >> >>>>>> compared to >> >> >>>>>> >> > Pinot >> >> >>>>>> >> > > >> and >> >> >>>>>> >> > > >> Druid? >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > >> Please kindly let me know >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > >> Thank you very much and best regards >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > > >> >> >>>>>> >> > > >> >> >>>>>> >> > >> >> >>>>>> >> >> >> >>>>>> > >> >> >>>>>> >> >> >>>>> >> >> >> > >> >