Re: How to reflect last hour data into Hive and Kylin Insights query window

Nam Đỗ Duy Tue, 21 Nov 2023 20:00:10 -0800

Thank you Xiaoxiang, I tried in my place and it worked for the ssb database
but it didn't work for my own database.


It only works if I restart kylin so I guess there might be some
configuration miss in my end.

Thank you very much anyway and will update next time.

Have a good day.

On Fri, Nov 17, 2023 at 5:34 PM Xiaoxiang Yu <x...@apache.org> wrote:

> I did an easy test to verify if kylin has any bugs for the push down
> function. And the push
> down function works as expected without any mistakes. So I'm 99% certain
> that
> your step "I loaded the incremental data into Hive already" does not work.
>
> Here are my steps(you can reproduce in a fresh Kylin5 docker container in
> one minute) :
>
> 1. Query `select count(*) from SSB.DATES` in project ssb without building
> any index.
>     Query result(Answered By: HIVE) is :   2556
>
> 2. Duplicate the file of table `ssb.dates` by following command:
>     hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv
> /user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv
>
> 3. Re-query `select count(*) from SSB.DATES` in project ssb
>     Query result(Answered By: HIVE) is :  5112
>
> So, it is clear that the second query incremental data can be found by the
> Kylin query engine.
>
> Finally, to make good use of Kylin in real use cases, good knowledge of
> Apache Spark
> and Apache Hadoop is a must-to-have.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Fri, Nov 17, 2023 at 5:52 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
> > Have a nice weekend Xiaoxiang, and thank you for helping me to become a
> > kylin's fan
> >
> > You are right I am not familiar with Kylin enough and have little
> > background of the hadoop system so I will double check here carefully
> > before
> > future questions. However I did understand the following mechanism
> > in quotes.
> >
> > ============quoted====================
> >
> > If incremental data is not loaded into Kylin, Kylin can still answer such
> > queries by
> > reading the original hive table, but the query is not accelerated.
> >
> > If incremental data is loaded into Kylin, Kylin can answer queries by
> > reading the special Index/Cuboid files, and the query will be
> accelerated.
> >
> > ============end====================
> >
> > I explain my previous question that was as follows:
> >
> > 1. I turned off this configuration kylin.query.cache-enabled (set =
> false)
> > 2. Restart Kylin
> > 3. I loaded the incremental data into Hive already
> > 4. Turn on Pushdown option to query Hive not model
> > 5. In Kylin Insights window, I still cannot get the incremental data
> (which
> > has been in Hive already)
> >
> > That was the reason why I asked you: can I get the incremental result by
> > above 5 steps (without model and index) or do I need to create model and
> > index and segment then I can  get the incremental result by creating a
> new
> > segment according to incremental data?
> >
> > Hope you get my point or I will explain more
> >
> > Thank you very much again
> >
> >
> > On Fri, 17 Nov 2023 at 16:00 Xiaoxiang Yu <x...@apache.org> wrote:
> >
> > > Unfortunately, I guess you are not asking good questions.
> > > If the answer of a question can be searched on the Internet,
> > > it is not recommended to ask it in the mailing list. I guess you
> > > didn't know how Kylin works, so you need to search for documents
> > >  or some tutorials.
> > >
> > > What does 'get the incremental data from Hive into Kylin' means? Kylin
> > > fully relies
> > > on Apache Spark for execution.
> > >
> > > If incremental data is not loaded into Kylin, Kylin can still answer
> such
> > > queries by
> > > reading the original hive table, but the query is not accelerated.
> > >
> > > If incremental data is loaded into Kylin, Kylin can answer queries by
> > > reading the special Index/Cuboid files, and the query will be
> > accelerated.
> > >
> > >
> > > ------------------------
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Fri, Nov 17, 2023 at 4:36 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > wrote:
> > >
> > > > Hi Xiaoxiang,
> > > >
> > > > Do I really need to create a model in order to get the incremental
> data
> > > > from Hive into Kylin?
> > > >
> > > > Can I query the incremental data of a pure dim/fact table without a
> > > model?
> > > >
> > > > Thank you very much
> > > >
> > > > On Fri, Nov 17, 2023 at 9:05 AM Xiaoxiang Yu <x...@apache.org>
> wrote:
> > > >
> > > > > I am not really sure. But I think it is the Query cache make your
> > query
> > > > > result unchanged.
> > > > >
> > > > >
> > > > > The config entry is kylin.query.cache-enabled , is turn on by
> > default.
> > > > > This doc links is
> > > > > https://kylin.apache.org/5.0/docs/configuration/query_cache
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best wishes to you !
> > > > > From ：Xiaoxiang Yu
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > At 2023-11-17 09:48:55, "Nam Đỗ Duy" <na...@vnpay.vn.INVALID>
> wrote:
> > > > > >Hello Team, hello Xiaoxiang, can you please help me with this
> urgent
> > > > > >issue...
> > > > > >
> > > > > >(this is public email group so in general I neglect your specific
> > name
> > > > > from
> > > > > >greeting of first email in the threads, but in fact most of time
> > > > Xiaoxiang
> > > > > >actively answers my issues, thank you very much)
> > > > > >
> > > > > >On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy <na...@vnpay.vn>
> wrote:
> > > > > >
> > > > > >> Dear Dev Team, please kindly advise this scenario
> > > > > >>
> > > > > >> 1. I have a fact table and I use Kylin insights window to query
> it
> > > and
> > > > > get
> > > > > >> 5 million rows.
> > > > > >>
> > > > > >> 2. Then I use following command to load X rows (last hour data)
> > from
> > > > > >> parquet into Hive table
> > > > > >>
> > > > > >> LOAD DATA LOCAL INPATH
> > > > > >> '/opt/LastHour/factUserEventDF_2023_11_16.parquet/14' INTO TABLE
> > > > > >> factUserEvent;
> > > > > >>
> > > > > >> 3. Then I open Kylin insights window to query it but it still
> > > returned
> > > > > >> previous number (5 million rows) not adding the last hour data
> of
> > X
> > > > rows
> > > > > >> which I previously loaded from parquet into hive in step 2)
> > > > > >>
> > > > > >> Can you advise the way to make table refresh and updated?
> > > > > >>
> > > > > >> Thank you very much
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Re: How to reflect last hour data into Hive and Kylin Insights query window

Reply via email to