Thank you Xiaoxiang, I tried in my place and it worked for the ssb database but it didn't work for my own database.
It only works if I restart kylin so I guess there might be some configuration miss in my end. Thank you very much anyway and will update next time. Have a good day. On Fri, Nov 17, 2023 at 5:34 PM Xiaoxiang Yu <x...@apache.org> wrote: > I did an easy test to verify if kylin has any bugs for the push down > function. And the push > down function works as expected without any mistakes. So I'm 99% certain > that > your step "I loaded the incremental data into Hive already" does not work. > > Here are my steps(you can reproduce in a fresh Kylin5 docker container in > one minute) : > > 1. Query `select count(*) from SSB.DATES` in project ssb without building > any index. > Query result(Answered By: HIVE) is : 2556 > > 2. Duplicate the file of table `ssb.dates` by following command: > hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv > /user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv > > 3. Re-query `select count(*) from SSB.DATES` in project ssb > Query result(Answered By: HIVE) is : 5112 > > So, it is clear that the second query incremental data can be found by the > Kylin query engine. > > Finally, to make good use of Kylin in real use cases, good knowledge of > Apache Spark > and Apache Hadoop is a must-to-have. > > ------------------------ > With warm regard > Xiaoxiang Yu > > > > On Fri, Nov 17, 2023 at 5:52 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > > > Have a nice weekend Xiaoxiang, and thank you for helping me to become a > > kylin's fan > > > > You are right I am not familiar with Kylin enough and have little > > background of the hadoop system so I will double check here carefully > > before > > future questions. However I did understand the following mechanism > > in quotes. > > > > ============quoted==================== > > > > If incremental data is not loaded into Kylin, Kylin can still answer such > > queries by > > reading the original hive table, but the query is not accelerated. > > > > If incremental data is loaded into Kylin, Kylin can answer queries by > > reading the special Index/Cuboid files, and the query will be > accelerated. > > > > ============end==================== > > > > I explain my previous question that was as follows: > > > > 1. I turned off this configuration kylin.query.cache-enabled (set = > false) > > 2. Restart Kylin > > 3. I loaded the incremental data into Hive already > > 4. Turn on Pushdown option to query Hive not model > > 5. In Kylin Insights window, I still cannot get the incremental data > (which > > has been in Hive already) > > > > That was the reason why I asked you: can I get the incremental result by > > above 5 steps (without model and index) or do I need to create model and > > index and segment then I can get the incremental result by creating a > new > > segment according to incremental data? > > > > Hope you get my point or I will explain more > > > > Thank you very much again > > > > > > On Fri, 17 Nov 2023 at 16:00 Xiaoxiang Yu <x...@apache.org> wrote: > > > > > Unfortunately, I guess you are not asking good questions. > > > If the answer of a question can be searched on the Internet, > > > it is not recommended to ask it in the mailing list. I guess you > > > didn't know how Kylin works, so you need to search for documents > > > or some tutorials. > > > > > > What does 'get the incremental data from Hive into Kylin' means? Kylin > > > fully relies > > > on Apache Spark for execution. > > > > > > If incremental data is not loaded into Kylin, Kylin can still answer > such > > > queries by > > > reading the original hive table, but the query is not accelerated. > > > > > > If incremental data is loaded into Kylin, Kylin can answer queries by > > > reading the special Index/Cuboid files, and the query will be > > accelerated. > > > > > > > > > ------------------------ > > > With warm regard > > > Xiaoxiang Yu > > > > > > > > > > > > On Fri, Nov 17, 2023 at 4:36 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> > > wrote: > > > > > > > Hi Xiaoxiang, > > > > > > > > Do I really need to create a model in order to get the incremental > data > > > > from Hive into Kylin? > > > > > > > > Can I query the incremental data of a pure dim/fact table without a > > > model? > > > > > > > > Thank you very much > > > > > > > > On Fri, Nov 17, 2023 at 9:05 AM Xiaoxiang Yu <x...@apache.org> > wrote: > > > > > > > > > I am not really sure. But I think it is the Query cache make your > > query > > > > > result unchanged. > > > > > > > > > > > > > > > The config entry is kylin.query.cache-enabled , is turn on by > > default. > > > > > This doc links is > > > > > https://kylin.apache.org/5.0/docs/configuration/query_cache > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Best wishes to you ! > > > > > From :Xiaoxiang Yu > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2023-11-17 09:48:55, "Nam Đỗ Duy" <na...@vnpay.vn.INVALID> > wrote: > > > > > >Hello Team, hello Xiaoxiang, can you please help me with this > urgent > > > > > >issue... > > > > > > > > > > > >(this is public email group so in general I neglect your specific > > name > > > > > from > > > > > >greeting of first email in the threads, but in fact most of time > > > > Xiaoxiang > > > > > >actively answers my issues, thank you very much) > > > > > > > > > > > >On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy <na...@vnpay.vn> > wrote: > > > > > > > > > > > >> Dear Dev Team, please kindly advise this scenario > > > > > >> > > > > > >> 1. I have a fact table and I use Kylin insights window to query > it > > > and > > > > > get > > > > > >> 5 million rows. > > > > > >> > > > > > >> 2. Then I use following command to load X rows (last hour data) > > from > > > > > >> parquet into Hive table > > > > > >> > > > > > >> LOAD DATA LOCAL INPATH > > > > > >> '/opt/LastHour/factUserEventDF_2023_11_16.parquet/14' INTO TABLE > > > > > >> factUserEvent; > > > > > >> > > > > > >> 3. Then I open Kylin insights window to query it but it still > > > returned > > > > > >> previous number (5 million rows) not adding the last hour data > of > > X > > > > rows > > > > > >> which I previously loaded from parquet into hive in step 2) > > > > > >> > > > > > >> Can you advise the way to make table refresh and updated? > > > > > >> > > > > > >> Thank you very much > > > > > >> > > > > > > > > > > > > > > >