Thank you Xiaoxiang, please update me when you have changed your default branch. In case people are impressed by the numbers then I hope to turn this situation to reverse direction.
On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <x...@apache.org> wrote: > The default branch is for 4.X which is a maintained branch, the active > branch is kylin5. > I will change the default branch to kylin5 later. > > ------------------------ > With warm regard > Xiaoxiang Yu > > > > On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > >> Hi Xiaoxiang, Sirs / Madams >> >> Can you see the atttached photo >> >> My boss asked that why druid commit code regularly but kylin had not been >> committed since July >> >> >> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org> wrote: >> >>> I think so. >>> >>> Response time is not the only factor to make a decision. Kylin could be >>> cheaper >>> when the query pattern is suitable for the Kylin model, and Kylin can >>> guarantee >>> reasonable query latency. Clickhouse will be quicker in an ad hoc query >>> scenario. >>> >>> By the way, Youzan and Kyligence combine them together to provide >>> unified data analytics services for their customers. >>> >>> ------------------------ >>> With warm regard >>> Xiaoxiang Yu >>> >>> >>> >>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>> wrote: >>> >>>> Hi Xiaoxiang, thank you >>>> >>>> In case my client uses cloud computing service like gcp or aws, which >>>> will cost more: precalculation feature of kylin or clickhouse (incase of >>>> kylin, I have a thought that the query execution has been done once and >>>> stored in cube to be used many times so kylin uses less cloud >>>> computation, >>>> is that true)? >>>> >>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org> wrote: >>>> >>>> > Following text is part of an article( >>>> > https://zhuanlan.zhihu.com/p/343394287) . >>>> > >>>> > >>>> > >>>> =============================================================================== >>>> > >>>> > Kylin is suitable for aggregation queries with fixed modes because of >>>> its >>>> > pre-calculated technology, for example, join, group by, and where >>>> condition >>>> > modes in SQL are relatively fixed, etc. The larger the data volume >>>> is, the >>>> > more obvious the advantages of using Kylin are; in particular, Kylin >>>> is >>>> > particularly advantageous in the scenarios of de-emphasis (count >>>> distinct), >>>> > Top N, and Percentile. In particular, Kylin's advantages in >>>> de-weighting >>>> > (count distinct), Top N, Percentile and other scenarios are especially >>>> > huge, and it is used in a large number of scenarios, such as >>>> Dashboard, all >>>> > kinds of reports, large-screen display, traffic statistics, and user >>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to >>>> build >>>> > their data service platforms, providing millions to tens of millions >>>> of >>>> > queries per day, and most of the queries can be completed within 2 - 3 >>>> > seconds. There is no better alternative for such a high concurrency >>>> > scenario. >>>> > >>>> > ClickHouse, because of its MPP architecture, has high computing power >>>> and >>>> > is more suitable when the query request is more flexible, or when >>>> there is >>>> > a need for detailed queries with low concurrency. Scenarios include: >>>> very >>>> > many columns and where conditions are arbitrarily combined with the >>>> user >>>> > label filtering, not a large amount of concurrency of complex >>>> on-the-spot >>>> > query and so on. If the amount of data and access is large, you need >>>> to >>>> > deploy a distributed ClickHouse cluster, which is a higher challenge >>>> for >>>> > operation and maintenance. >>>> > >>>> > If some queries are very flexible but infrequent, it is more >>>> > resource-efficient to use now-computing. Since the number of queries >>>> is >>>> > small, even if each query consumes a lot of computational resources, >>>> it is >>>> > still cost-effective overall. If some queries have a fixed pattern >>>> and the >>>> > query volume is large, it is more suitable for Kylin, because the >>>> query >>>> > volume is large, and by using large computational resources to save >>>> the >>>> > results, the upfront computational cost can be amortized over each >>>> query, >>>> > so it is the most economical. >>>> > >>>> > --- Translated with DeepL.com (free version) >>>> > >>>> > >>>> > ------------------------ >>>> > With warm regard >>>> > Xiaoxiang Yu >>>> > >>>> > >>>> > >>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>>> wrote: >>>> > >>>> >> Thank you Xiaoxiang for the near real time streaming feature. That's >>>> >> great. >>>> >> >>>> >> This morning there has been a new challenge to my team: clickhouse >>>> offered >>>> >> us the speed of calculating 8 billion rows in millisecond which is >>>> faster >>>> >> than my demonstration (I used Kylin to do calculating 1 billion rows >>>> in >>>> >> 2.9 >>>> >> seconds) >>>> >> >>>> >> Can you briefly suggest the advantages of kylin over clickhouse so >>>> that I >>>> >> can defend my demonstration. >>>> >> >>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <x...@apache.org> wrote: >>>> >> >>>> >> > 1. "In this important scenario of realtime analytics, the reason >>>> here is >>>> >> > that >>>> >> > kylin has lag time due to model update of new segment build, is >>>> that >>>> >> > correct?" >>>> >> > >>>> >> > You are correct. >>>> >> > >>>> >> > 2. "If that is true, then can you suggest a work-around of >>>> combination >>>> >> of >>>> >> > ... " >>>> >> > >>>> >> > Kylin is planning to introduce NRT streaming(coding is completed >>>> but not >>>> >> > released), >>>> >> > which can make the time-lag to about 3 minutes(that is my >>>> estimation >>>> >> but I >>>> >> > am >>>> >> > quite certain about it). >>>> >> > NRT stands for 'near real-time', it will run a job and do >>>> micro-batch >>>> >> > aggregation and persistence periodically. The price is that you >>>> need to >>>> >> run >>>> >> > and monitor a long-running >>>> >> > job. This feature is based on Spark Streaming, so you need >>>> knowledge of >>>> >> > it. >>>> >> > >>>> >> > I am curious about what is the maximum time-lag your customers >>>> >> > can tolerate? >>>> >> > Personally, I guess minute level time-lag is ok for most cases. >>>> >> > >>>> >> > ------------------------ >>>> >> > With warm regard >>>> >> > Xiaoxiang Yu >>>> >> > >>>> >> > >>>> >> > >>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid >>>> > >>>> >> wrote: >>>> >> > >>>> >> > > Druid is better in >>>> >> > > - Have a real-time datasource like Kafka etc. >>>> >> > > >>>> >> > > ========================== >>>> >> > > >>>> >> > > Hi Xiaoxiang, thank you for your response. >>>> >> > > >>>> >> > > In this important scenario of realtime alalytics, the reason >>>> here is >>>> >> that >>>> >> > > kylin has lag time due to model update of new segment build, is >>>> that >>>> >> > > correct? >>>> >> > > >>>> >> > > If that is true, then can you suggest a work-around of >>>> combination of >>>> >> : >>>> >> > > >>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide >>>> >> > > realtime capability ? >>>> >> > > >>>> >> > > IMO, the point here is to find that (realtime DB update) and >>>> >> integrate it >>>> >> > > with (time - lag kylin cube). >>>> >> > > >>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <x...@apache.org> >>>> wrote: >>>> >> > > >>>> >> > > > I researched and tested Druid two years ago(I don't know too >>>> much >>>> >> about >>>> >> > > > the change of Druid in these two years. New features that I >>>> know >>>> >> are : >>>> >> > > > new UI, fully on K8s etc). >>>> >> > > > >>>> >> > > > Here are some cases you should consider using Druid other than >>>> Kylin >>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid which >>>> I >>>> >> used >>>> >> > two >>>> >> > > > years ago): >>>> >> > > > >>>> >> > > > - Have a real-time datasource like Kafka etc. >>>> >> > > > - Most queries are small(Based on my test result, I think >>>> Druid had >>>> >> > > better >>>> >> > > > response time for small queries two years ago.) >>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the >>>> >> K8S/public >>>> >> > > > cloud platform as your deployment platform. >>>> >> > > > >>>> >> > > > But I do think there are many scenarios in which Kylin could be >>>> >> better, >>>> >> > > > like: >>>> >> > > > >>>> >> > > > - Better performance for complex/big queries. Kylin can have a >>>> more >>>> >> > > > exact-match/fine-grained >>>> >> > > > Index for queries containing different `Group By dimensions`. >>>> >> > > > - User-friendly UI for modeling. >>>> >> > > > - Support 'Join' better? (Not sure at the moment) >>>> >> > > > - ODBC driver for different BI.(its website did not show it >>>> supports >>>> >> > ODBC >>>> >> > > > well) >>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid. >>>> >> > > > >>>> >> > > > >>>> >> > > > I don't know Pinot, so I have nothing to say about it. >>>> >> > > > Hope to help you, or you are free to share your opinion. >>>> >> > > > >>>> >> > > > ------------------------ >>>> >> > > > With warm regard >>>> >> > > > Xiaoxiang Yu >>>> >> > > > >>>> >> > > > >>>> >> > > > >>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy >>>> <na...@vnpay.vn.invalid> >>>> >> > > wrote: >>>> >> > > > >>>> >> > > >> Dear Xiaoxiang, >>>> >> > > >> Sirs/Madams, >>>> >> > > >> >>>> >> > > >> May I post my boss's question: >>>> >> > > >> >>>> >> > > >> What are the pros and cons of the OLAP platform Kylin >>>> compared to >>>> >> > Pinot >>>> >> > > >> and >>>> >> > > >> Druid? >>>> >> > > >> >>>> >> > > >> Please kindly let me know >>>> >> > > >> >>>> >> > > >> Thank you very much and best regards >>>> >> > > >> >>>> >> > > > >>>> >> > > >>>> >> > >>>> >> >>>> > >>>> >>>