It is a good question, I can share some articles with you. 1. How to build a metric repository by Kylin to share among data teams (DA, DS, AI), is that the usage of measure in Kylin?
I think the metric repository(or metrics store) is actually which Kylin can help. For example, Beike(ke.com) did create an indicator/metrics platform whose backend is Kylin. They created a metrics store on the top of Kylin. The architecture looks like this https://mmbiz.qpic.cn/mmbiz_png/9xAoGyC249Kd9icMaNT1Gs7AlDAZic7PScYNCOkSQF8PqbuSLicoxhdk4w3kJtC0bms4FzW6iby08bNiaVsUzUkBPmg/640?wx_fmt=png&wxfrom=5&wx_lazy=1&wx_co=1 Here is technical article which wrote in Chinese about it(I am sorry this is not translated): https://mp.weixin.qq.com/s/hsGjuaYfEfParcgTimBLnw 2. How to use Kylin for the Customer segmentation of Marketing dept? Here are some articles : (sorry again for these are not translated) https://kylin.apache.org/blog/2016/11/28/intersect-count/ https://zhuanlan.zhihu.com/p/100131550 https://cn.kyligence.io/blog/kylin-chinagreentown-user-portrait-2/ https://cn.kyligence.io/blog/apache-kylin-count-distinct-application-in-user-behavior-analysis/ https://www.infoq.cn/article/xZYe1DUopNA9CzLwau3O You can send your presentation material to me if you are willing to share. ------------------------ With warm regard Xiaoxiang Yu On Wed, Nov 22, 2023 at 5:36 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > Thank you Xiaoxiang, tomorrow noon is my presentation to the management > about kylin so I am pending this issue to focus on following ones, can you > please advise: > > 1. How to build a metric repository by Kylin to share among data teams (DA, > DS, AI), is that the usage of measure in Kylin? > 2. How to use Kylin for the Customer segmentation of Marketing dept? > > > On Wed, Nov 22, 2023 at 2:10 PM Xiaoxiang Yu <x...@apache.org> wrote: > > > Before you try again, you can use spark-sql/spark-shell to check if the > > data is loaded > > into your table successfully (or if your data is copied to the right > > place). > > Following is how to start a spark-sql/spark-shell in a container. > > > > export HADOOP_CONF_DIR=/opt/hadoop-3.2.1/etc/hadoop > > > > cd /home/kylin/apache-kylin-5.0.0-beta-bin/spark > > > > bin/spark-shell --executor-cores 1 --num-executors 1 --master yarn > > > > > > The result of spark-sql/spark-shell should be the same as your > > saw in Kylin insight page. If there are different results for the same > > query, > > which should not happen, please let me know. > > > > Hope you can fix your problem soon. > > > > ------------------------ > > With warm regard > > Xiaoxiang Yu > > > > > > > > On Wed, Nov 22, 2023 at 11:59 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> > > wrote: > > > > > Thank you Xiaoxiang, I tried in my place and it worked for the ssb > > database > > > but it didn't work for my own database. > > > > > > It only works if I restart kylin so I guess there might be some > > > configuration miss in my end. > > > > > > Thank you very much anyway and will update next time. > > > > > > Have a good day. > > > > > > On Fri, Nov 17, 2023 at 5:34 PM Xiaoxiang Yu <x...@apache.org> wrote: > > > > > > > I did an easy test to verify if kylin has any bugs for the push down > > > > function. And the push > > > > down function works as expected without any mistakes. So I'm 99% > > certain > > > > that > > > > your step "I loaded the incremental data into Hive already" does not > > > work. > > > > > > > > Here are my steps(you can reproduce in a fresh Kylin5 docker > container > > in > > > > one minute) : > > > > > > > > 1. Query `select count(*) from SSB.DATES` in project ssb without > > building > > > > any index. > > > > Query result(Answered By: HIVE) is : 2556 > > > > > > > > 2. Duplicate the file of table `ssb.dates` by following command: > > > > hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv > > > > /user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv > > > > > > > > 3. Re-query `select count(*) from SSB.DATES` in project ssb > > > > Query result(Answered By: HIVE) is : 5112 > > > > > > > > So, it is clear that the second query incremental data can be found > by > > > the > > > > Kylin query engine. > > > > > > > > Finally, to make good use of Kylin in real use cases, good knowledge > of > > > > Apache Spark > > > > and Apache Hadoop is a must-to-have. > > > > > > > > ------------------------ > > > > With warm regard > > > > Xiaoxiang Yu > > > > > > > > > > > > > > > > On Fri, Nov 17, 2023 at 5:52 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> > > > wrote: > > > > > > > > > Have a nice weekend Xiaoxiang, and thank you for helping me to > > become a > > > > > kylin's fan > > > > > > > > > > You are right I am not familiar with Kylin enough and have little > > > > > background of the hadoop system so I will double check here > carefully > > > > > before > > > > > future questions. However I did understand the following mechanism > > > > > in quotes. > > > > > > > > > > ============quoted==================== > > > > > > > > > > If incremental data is not loaded into Kylin, Kylin can still > answer > > > such > > > > > queries by > > > > > reading the original hive table, but the query is not accelerated. > > > > > > > > > > If incremental data is loaded into Kylin, Kylin can answer queries > by > > > > > reading the special Index/Cuboid files, and the query will be > > > > accelerated. > > > > > > > > > > ============end==================== > > > > > > > > > > I explain my previous question that was as follows: > > > > > > > > > > 1. I turned off this configuration kylin.query.cache-enabled (set = > > > > false) > > > > > 2. Restart Kylin > > > > > 3. I loaded the incremental data into Hive already > > > > > 4. Turn on Pushdown option to query Hive not model > > > > > 5. In Kylin Insights window, I still cannot get the incremental > data > > > > (which > > > > > has been in Hive already) > > > > > > > > > > That was the reason why I asked you: can I get the incremental > result > > > by > > > > > above 5 steps (without model and index) or do I need to create > model > > > and > > > > > index and segment then I can get the incremental result by > creating > > a > > > > new > > > > > segment according to incremental data? > > > > > > > > > > Hope you get my point or I will explain more > > > > > > > > > > Thank you very much again > > > > > > > > > > > > > > > On Fri, 17 Nov 2023 at 16:00 Xiaoxiang Yu <x...@apache.org> wrote: > > > > > > > > > > > Unfortunately, I guess you are not asking good questions. > > > > > > If the answer of a question can be searched on the Internet, > > > > > > it is not recommended to ask it in the mailing list. I guess you > > > > > > didn't know how Kylin works, so you need to search for documents > > > > > > or some tutorials. > > > > > > > > > > > > What does 'get the incremental data from Hive into Kylin' means? > > > Kylin > > > > > > fully relies > > > > > > on Apache Spark for execution. > > > > > > > > > > > > If incremental data is not loaded into Kylin, Kylin can still > > answer > > > > such > > > > > > queries by > > > > > > reading the original hive table, but the query is not > accelerated. > > > > > > > > > > > > If incremental data is loaded into Kylin, Kylin can answer > queries > > by > > > > > > reading the special Index/Cuboid files, and the query will be > > > > > accelerated. > > > > > > > > > > > > > > > > > > ------------------------ > > > > > > With warm regard > > > > > > Xiaoxiang Yu > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 17, 2023 at 4:36 PM Nam Đỗ Duy > <na...@vnpay.vn.invalid > > > > > > > > wrote: > > > > > > > > > > > > > Hi Xiaoxiang, > > > > > > > > > > > > > > Do I really need to create a model in order to get the > > incremental > > > > data > > > > > > > from Hive into Kylin? > > > > > > > > > > > > > > Can I query the incremental data of a pure dim/fact table > > without a > > > > > > model? > > > > > > > > > > > > > > Thank you very much > > > > > > > > > > > > > > On Fri, Nov 17, 2023 at 9:05 AM Xiaoxiang Yu <x...@apache.org> > > > > wrote: > > > > > > > > > > > > > > > I am not really sure. But I think it is the Query cache make > > your > > > > > query > > > > > > > > result unchanged. > > > > > > > > > > > > > > > > > > > > > > > > The config entry is kylin.query.cache-enabled , is turn on by > > > > > default. > > > > > > > > This doc links is > > > > > > > > https://kylin.apache.org/5.0/docs/configuration/query_cache > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > Best wishes to you ! > > > > > > > > From :Xiaoxiang Yu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2023-11-17 09:48:55, "Nam Đỗ Duy" <na...@vnpay.vn.INVALID > > > > > > wrote: > > > > > > > > >Hello Team, hello Xiaoxiang, can you please help me with > this > > > > urgent > > > > > > > > >issue... > > > > > > > > > > > > > > > > > >(this is public email group so in general I neglect your > > > specific > > > > > name > > > > > > > > from > > > > > > > > >greeting of first email in the threads, but in fact most of > > time > > > > > > > Xiaoxiang > > > > > > > > >actively answers my issues, thank you very much) > > > > > > > > > > > > > > > > > >On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy <na...@vnpay.vn> > > > > wrote: > > > > > > > > > > > > > > > > > >> Dear Dev Team, please kindly advise this scenario > > > > > > > > >> > > > > > > > > >> 1. I have a fact table and I use Kylin insights window to > > > query > > > > it > > > > > > and > > > > > > > > get > > > > > > > > >> 5 million rows. > > > > > > > > >> > > > > > > > > >> 2. Then I use following command to load X rows (last hour > > > data) > > > > > from > > > > > > > > >> parquet into Hive table > > > > > > > > >> > > > > > > > > >> LOAD DATA LOCAL INPATH > > > > > > > > >> '/opt/LastHour/factUserEventDF_2023_11_16.parquet/14' INTO > > > TABLE > > > > > > > > >> factUserEvent; > > > > > > > > >> > > > > > > > > >> 3. Then I open Kylin insights window to query it but it > > still > > > > > > returned > > > > > > > > >> previous number (5 million rows) not adding the last hour > > data > > > > of > > > > > X > > > > > > > rows > > > > > > > > >> which I previously loaded from parquet into hive in step > 2) > > > > > > > > >> > > > > > > > > >> Can you advise the way to make table refresh and updated? > > > > > > > > >> > > > > > > > > >> Thank you very much > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >