Thank you Xiaoxiang for your reply ————————————- Do you have any suggestions/wishes for kylin 5(except real-time feature)? ————————————- Yes: please answer to help me clear this headache:
1. Can Kylin access the existing star schema in Oracle datawarehouse ? If not then do we have any work around? 2. My team is using kerberos for authentication, do you have any document/casestudy about integrating kerberos with kylin 4.x and kylin 5.x 3. Should we use apache ranger instead of kerberos for authentication and for security purposes? Thank you again On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <x...@apache.org> wrote: > I guess the release date should be 2024/01 . > Do you have any suggestions/wishes for kylin 5(except real-time feature)? > > ------------------------ > With warm regard > Xiaoxiang Yu > > > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > >> Thank you very much xiaoxiang, I did the presentation this morning already >> so there is no time for you to comment. Next time I will send you in >> advance. The meeting result was that we will implement both druid and >> kylin >> in the next couple of projects because of its realtime feature. Hope that >> kylin will have same feature soon. >> >> May I ask when will you release kylin 5.0? >> >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <x...@apache.org> wrote: >> >> > Since 2018 there are a lot of new features and code refactor. >> > If you like, you can share your ppt to me privately, maybe I can >> > give some comments. >> > >> > Here is the reference of advantages of Kylin since 2018: >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/ >> > - >> > >> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/ >> > - https://kylin.apache.org/5.0/docs/development/roadmap >> > >> > ------------------------ >> > With warm regard >> > Xiaoxiang Yu >> > >> > >> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >> wrote: >> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and >> Druid in >> >> my team. >> >> >> >> I found this article and would like you to update me the advantages of >> >> Kylin since 2018 until now (especially with version 5 to be released) >> >> >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)? >> >> < >> >> >> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/ >> >> > >> >> >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote: >> >> >> >> > Thank you very much for your prompt response, I still have several >> >> > questions to seek for your help later. >> >> > >> >> > Best regards and have a good day >> >> > >> >> > >> >> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <x...@apache.org> wrote: >> >> > >> >> >> Done. Github branch changed to kylin5. >> >> >> >> >> >> ------------------------ >> >> >> With warm regard >> >> >> Xiaoxiang Yu >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <x...@apache.org> >> wrote: >> >> >> >> >> >> > A JIRA ticket has been opened, waiting for INFRA : >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 . >> >> >> > ------------------------ >> >> >> > With warm regard >> >> >> > Xiaoxiang Yu >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid >> > >> >> >> wrote: >> >> >> > >> >> >> >> Thank you Xiaoxiang, please update me when you have changed your >> >> >> default >> >> >> >> branch. In case people are impressed by the numbers then I hope >> to >> >> turn >> >> >> >> this situation to reverse direction. >> >> >> >> >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <x...@apache.org> >> >> wrote: >> >> >> >> >> >> >> >>> The default branch is for 4.X which is a maintained branch, the >> >> active >> >> >> >>> branch is kylin5. >> >> >> >>> I will change the default branch to kylin5 later. >> >> >> >>> >> >> >> >>> ------------------------ >> >> >> >>> With warm regard >> >> >> >>> Xiaoxiang Yu >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy >> <na...@vnpay.vn.invalid> >> >> >> >>> wrote: >> >> >> >>> >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams >> >> >> >>>> >> >> >> >>>> Can you see the atttached photo >> >> >> >>>> >> >> >> >>>> My boss asked that why druid commit code regularly but kylin >> had >> >> not >> >> >> >>>> been committed since July >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org> >> wrote: >> >> >> >>>> >> >> >> >>>>> I think so. >> >> >> >>>>> >> >> >> >>>>> Response time is not the only factor to make a decision. Kylin >> >> could >> >> >> >>>>> be cheaper >> >> >> >>>>> when the query pattern is suitable for the Kylin model, and >> Kylin >> >> >> can >> >> >> >>>>> guarantee >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad >> hoc >> >> >> >>>>> query scenario. >> >> >> >>>>> >> >> >> >>>>> By the way, Youzan and Kyligence combine them together to >> provide >> >> >> >>>>> unified data analytics services for their customers. >> >> >> >>>>> >> >> >> >>>>> ------------------------ >> >> >> >>>>> With warm regard >> >> >> >>>>> Xiaoxiang Yu >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy >> <na...@vnpay.vn.invalid >> >> > >> >> >> >>>>> wrote: >> >> >> >>>>> >> >> >> >>>>>> Hi Xiaoxiang, thank you >> >> >> >>>>>> >> >> >> >>>>>> In case my client uses cloud computing service like gcp or >> aws, >> >> >> which >> >> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse >> >> >> (incase >> >> >> >>>>>> of >> >> >> >>>>>> kylin, I have a thought that the query execution has been >> done >> >> once >> >> >> >>>>>> and >> >> >> >>>>>> stored in cube to be used many times so kylin uses less cloud >> >> >> >>>>>> computation, >> >> >> >>>>>> is that true)? >> >> >> >>>>>> >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org >> > >> >> >> wrote: >> >> >> >>>>>> >> >> >> >>>>>> > Following text is part of an article( >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) . >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> >> >> >> >> >> >> =============================================================================== >> >> >> >>>>>> > >> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes >> >> >> because >> >> >> >>>>>> of its >> >> >> >>>>>> > pre-calculated technology, for example, join, group by, and >> >> where >> >> >> >>>>>> condition >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data >> >> >> volume >> >> >> >>>>>> is, the >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in >> particular, >> >> >> >>>>>> Kylin is >> >> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis >> >> (count >> >> >> >>>>>> distinct), >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in >> >> >> >>>>>> de-weighting >> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are >> >> >> >>>>>> especially >> >> >> >>>>>> > huge, and it is used in a large number of scenarios, such >> as >> >> >> >>>>>> Dashboard, all >> >> >> >>>>>> > kinds of reports, large-screen display, traffic statistics, >> >> and >> >> >> user >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use >> >> Kylin >> >> >> >>>>>> to build >> >> >> >>>>>> > their data service platforms, providing millions to tens of >> >> >> >>>>>> millions of >> >> >> >>>>>> > queries per day, and most of the queries can be completed >> >> within >> >> >> 2 >> >> >> >>>>>> - 3 >> >> >> >>>>>> > seconds. There is no better alternative for such a high >> >> >> concurrency >> >> >> >>>>>> > scenario. >> >> >> >>>>>> > >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high >> >> computing >> >> >> >>>>>> power and >> >> >> >>>>>> > is more suitable when the query request is more flexible, >> or >> >> when >> >> >> >>>>>> there is >> >> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios >> >> >> >>>>>> include: very >> >> >> >>>>>> > many columns and where conditions are arbitrarily combined >> >> with >> >> >> the >> >> >> >>>>>> user >> >> >> >>>>>> > label filtering, not a large amount of concurrency of >> complex >> >> >> >>>>>> on-the-spot >> >> >> >>>>>> > query and so on. If the amount of data and access is large, >> >> you >> >> >> >>>>>> need to >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher >> >> >> >>>>>> challenge for >> >> >> >>>>>> > operation and maintenance. >> >> >> >>>>>> > >> >> >> >>>>>> > If some queries are very flexible but infrequent, it is >> more >> >> >> >>>>>> > resource-efficient to use now-computing. Since the number >> of >> >> >> >>>>>> queries is >> >> >> >>>>>> > small, even if each query consumes a lot of computational >> >> >> >>>>>> resources, it is >> >> >> >>>>>> > still cost-effective overall. If some queries have a fixed >> >> >> pattern >> >> >> >>>>>> and the >> >> >> >>>>>> > query volume is large, it is more suitable for Kylin, >> because >> >> the >> >> >> >>>>>> query >> >> >> >>>>>> > volume is large, and by using large computational >> resources to >> >> >> save >> >> >> >>>>>> the >> >> >> >>>>>> > results, the upfront computational cost can be amortized >> over >> >> >> each >> >> >> >>>>>> query, >> >> >> >>>>>> > so it is the most economical. >> >> >> >>>>>> > >> >> >> >>>>>> > --- Translated with DeepL.com (free version) >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > ------------------------ >> >> >> >>>>>> > With warm regard >> >> >> >>>>>> > Xiaoxiang Yu >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy >> >> <na...@vnpay.vn.invalid >> >> >> > >> >> >> >>>>>> wrote: >> >> >> >>>>>> > >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming >> feature. >> >> >> >>>>>> That's >> >> >> >>>>>> >> great. >> >> >> >>>>>> >> >> >> >> >>>>>> >> This morning there has been a new challenge to my team: >> >> >> clickhouse >> >> >> >>>>>> offered >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond >> >> which >> >> >> is >> >> >> >>>>>> faster >> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1 >> >> billion >> >> >> >>>>>> rows in >> >> >> >>>>>> >> 2.9 >> >> >> >>>>>> >> seconds) >> >> >> >>>>>> >> >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over >> >> clickhouse >> >> >> so >> >> >> >>>>>> that I >> >> >> >>>>>> >> can defend my demonstration. >> >> >> >>>>>> >> >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu < >> x...@apache.org >> >> > >> >> >> >>>>>> wrote: >> >> >> >>>>>> >> >> >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, >> the >> >> >> reason >> >> >> >>>>>> here is >> >> >> >>>>>> >> > that >> >> >> >>>>>> >> > kylin has lag time due to model update of new segment >> >> build, >> >> >> is >> >> >> >>>>>> that >> >> >> >>>>>> >> > correct?" >> >> >> >>>>>> >> > >> >> >> >>>>>> >> > You are correct. >> >> >> >>>>>> >> > >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around >> of >> >> >> >>>>>> combination >> >> >> >>>>>> >> of >> >> >> >>>>>> >> > ... " >> >> >> >>>>>> >> > >> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is >> >> >> completed >> >> >> >>>>>> but not >> >> >> >>>>>> >> > released), >> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is >> my >> >> >> >>>>>> estimation >> >> >> >>>>>> >> but I >> >> >> >>>>>> >> > am >> >> >> >>>>>> >> > quite certain about it). >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and >> do >> >> >> >>>>>> micro-batch >> >> >> >>>>>> >> > aggregation and persistence periodically. The price is >> that >> >> >> you >> >> >> >>>>>> need to >> >> >> >>>>>> >> run >> >> >> >>>>>> >> > and monitor a long-running >> >> >> >>>>>> >> > job. This feature is based on Spark Streaming, so you >> need >> >> >> >>>>>> knowledge of >> >> >> >>>>>> >> > it. >> >> >> >>>>>> >> > >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your >> >> customers >> >> >> >>>>>> >> > can tolerate? >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most >> >> >> cases. >> >> >> >>>>>> >> > >> >> >> >>>>>> >> > ------------------------ >> >> >> >>>>>> >> > With warm regard >> >> >> >>>>>> >> > Xiaoxiang Yu >> >> >> >>>>>> >> > >> >> >> >>>>>> >> > >> >> >> >>>>>> >> > >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy >> >> >> >>>>>> <na...@vnpay.vn.invalid> >> >> >> >>>>>> >> wrote: >> >> >> >>>>>> >> > >> >> >> >>>>>> >> > > Druid is better in >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc. >> >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > ========================== >> >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response. >> >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > In this important scenario of realtime alalytics, the >> >> reason >> >> >> >>>>>> here is >> >> >> >>>>>> >> that >> >> >> >>>>>> >> > > kylin has lag time due to model update of new segment >> >> build, >> >> >> >>>>>> is that >> >> >> >>>>>> >> > > correct? >> >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > If that is true, then can you suggest a work-around of >> >> >> >>>>>> combination of >> >> >> >>>>>> >> : >> >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to >> provide >> >> >> >>>>>> >> > > realtime capability ? >> >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB >> update) >> >> and >> >> >> >>>>>> >> integrate it >> >> >> >>>>>> >> > > with (time - lag kylin cube). >> >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu < >> >> >> x...@apache.org> >> >> >> >>>>>> wrote: >> >> >> >>>>>> >> > > >> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't >> >> know >> >> >> too >> >> >> >>>>>> much >> >> >> >>>>>> >> about >> >> >> >>>>>> >> > > > the change of Druid in these two years. New >> features >> >> >> that I >> >> >> >>>>>> know >> >> >> >>>>>> >> are : >> >> >> >>>>>> >> > > > new UI, fully on K8s etc). >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > > Here are some cases you should consider using Druid >> >> other >> >> >> >>>>>> than Kylin >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the >> >> Druid >> >> >> >>>>>> which I >> >> >> >>>>>> >> used >> >> >> >>>>>> >> > two >> >> >> >>>>>> >> > > > years ago): >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc. >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I >> >> think >> >> >> >>>>>> Druid had >> >> >> >>>>>> >> > > better >> >> >> >>>>>> >> > > > response time for small queries two years ago.) >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to >> use >> >> the >> >> >> >>>>>> >> K8S/public >> >> >> >>>>>> >> > > > cloud platform as your deployment platform. >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > > But I do think there are many scenarios in which >> Kylin >> >> >> could >> >> >> >>>>>> be >> >> >> >>>>>> >> better, >> >> >> >>>>>> >> > > > like: >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin >> can >> >> >> have >> >> >> >>>>>> a more >> >> >> >>>>>> >> > > > exact-match/fine-grained >> >> >> >>>>>> >> > > > Index for queries containing different `Group By >> >> >> >>>>>> dimensions`. >> >> >> >>>>>> >> > > > - User-friendly UI for modeling. >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment) >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not >> >> show >> >> >> it >> >> >> >>>>>> supports >> >> >> >>>>>> >> > ODBC >> >> >> >>>>>> >> > > > well) >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than >> Druid. >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about >> it. >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your >> >> opinion. >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > > ------------------------ >> >> >> >>>>>> >> > > > With warm regard >> >> >> >>>>>> >> > > > Xiaoxiang Yu >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy >> >> >> >>>>>> <na...@vnpay.vn.invalid> >> >> >> >>>>>> >> > > wrote: >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > >> Dear Xiaoxiang, >> >> >> >>>>>> >> > > >> Sirs/Madams, >> >> >> >>>>>> >> > > >> >> >> >> >>>>>> >> > > >> May I post my boss's question: >> >> >> >>>>>> >> > > >> >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform >> Kylin >> >> >> >>>>>> compared to >> >> >> >>>>>> >> > Pinot >> >> >> >>>>>> >> > > >> and >> >> >> >>>>>> >> > > >> Druid? >> >> >> >>>>>> >> > > >> >> >> >> >>>>>> >> > > >> Please kindly let me know >> >> >> >>>>>> >> > > >> >> >> >> >>>>>> >> > > >> Thank you very much and best regards >> >> >> >>>>>> >> > > >> >> >> >> >>>>>> >> > > > >> >> >> >>>>>> >> > > >> >> >> >>>>>> >> > >> >> >> >>>>>> >> >> >> >> >>>>>> > >> >> >> >>>>>> >> >> >> >>>>> >> >> >> >> >> > >> >> >> > >> >