Re: How to reflect last hour data into Hive and Kylin Insights query window

Xiaoxiang Yu Fri, 17 Nov 2023 02:33:34 -0800

I did an easy test to verify if kylin has any bugs for the push down
function. And the push
down function works as expected without any mistakes. So I'm 99% certain
that
your step "I loaded the incremental data into Hive already" does not work.


Here are my steps(you can reproduce in a fresh Kylin5 docker container in
one minute) :

1. Query `select count(*) from SSB.DATES` in project ssb without building
any index.
    Query result(Answered By: HIVE) is :   2556

2. Duplicate the file of table `ssb.dates` by following command:
    hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv
/user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv

3. Re-query `select count(*) from SSB.DATES` in project ssb
    Query result(Answered By: HIVE) is :  5112

So, it is clear that the second query incremental data can be found by the
Kylin query engine.

Finally, to make good use of Kylin in real use cases, good knowledge of
Apache Spark
and Apache Hadoop is a must-to-have.

------------------------
With warm regard
Xiaoxiang Yu



On Fri, Nov 17, 2023 at 5:52 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Have a nice weekend Xiaoxiang, and thank you for helping me to become a
> kylin's fan
>
> You are right I am not familiar with Kylin enough and have little
> background of the hadoop system so I will double check here carefully
> before
> future questions. However I did understand the following mechanism
> in quotes.
>
> ============quoted====================
>
> If incremental data is not loaded into Kylin, Kylin can still answer such
> queries by
> reading the original hive table, but the query is not accelerated.
>
> If incremental data is loaded into Kylin, Kylin can answer queries by
> reading the special Index/Cuboid files, and the query will be accelerated.
>
> ============end====================
>
> I explain my previous question that was as follows:
>
> 1. I turned off this configuration kylin.query.cache-enabled (set = false)
> 2. Restart Kylin
> 3. I loaded the incremental data into Hive already
> 4. Turn on Pushdown option to query Hive not model
> 5. In Kylin Insights window, I still cannot get the incremental data (which
> has been in Hive already)
>
> That was the reason why I asked you: can I get the incremental result by
> above 5 steps (without model and index) or do I need to create model and
> index and segment then I can  get the incremental result by creating a new
> segment according to incremental data?
>
> Hope you get my point or I will explain more
>
> Thank you very much again
>
>
> On Fri, 17 Nov 2023 at 16:00 Xiaoxiang Yu <x...@apache.org> wrote:
>
> > Unfortunately, I guess you are not asking good questions.
> > If the answer of a question can be searched on the Internet,
> > it is not recommended to ask it in the mailing list. I guess you
> > didn't know how Kylin works, so you need to search for documents
> >  or some tutorials.
> >
> > What does 'get the incremental data from Hive into Kylin' means? Kylin
> > fully relies
> > on Apache Spark for execution.
> >
> > If incremental data is not loaded into Kylin, Kylin can still answer such
> > queries by
> > reading the original hive table, but the query is not accelerated.
> >
> > If incremental data is loaded into Kylin, Kylin can answer queries by
> > reading the special Index/Cuboid files, and the query will be
> accelerated.
> >
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Fri, Nov 17, 2023 at 4:36 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> > > Hi Xiaoxiang,
> > >
> > > Do I really need to create a model in order to get the incremental data
> > > from Hive into Kylin?
> > >
> > > Can I query the incremental data of a pure dim/fact table without a
> > model?
> > >
> > > Thank you very much
> > >
> > > On Fri, Nov 17, 2023 at 9:05 AM Xiaoxiang Yu <x...@apache.org> wrote:
> > >
> > > > I am not really sure. But I think it is the Query cache make your
> query
> > > > result unchanged.
> > > >
> > > >
> > > > The config entry is kylin.query.cache-enabled , is turn on by
> default.
> > > > This doc links is
> > > > https://kylin.apache.org/5.0/docs/configuration/query_cache
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best wishes to you !
> > > > From ：Xiaoxiang Yu
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > At 2023-11-17 09:48:55, "Nam Đỗ Duy" <na...@vnpay.vn.INVALID> wrote:
> > > > >Hello Team, hello Xiaoxiang, can you please help me with this urgent
> > > > >issue...
> > > > >
> > > > >(this is public email group so in general I neglect your specific
> name
> > > > from
> > > > >greeting of first email in the threads, but in fact most of time
> > > Xiaoxiang
> > > > >actively answers my issues, thank you very much)
> > > > >
> > > > >On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy <na...@vnpay.vn> wrote:
> > > > >
> > > > >> Dear Dev Team, please kindly advise this scenario
> > > > >>
> > > > >> 1. I have a fact table and I use Kylin insights window to query it
> > and
> > > > get
> > > > >> 5 million rows.
> > > > >>
> > > > >> 2. Then I use following command to load X rows (last hour data)
> from
> > > > >> parquet into Hive table
> > > > >>
> > > > >> LOAD DATA LOCAL INPATH
> > > > >> '/opt/LastHour/factUserEventDF_2023_11_16.parquet/14' INTO TABLE
> > > > >> factUserEvent;
> > > > >>
> > > > >> 3. Then I open Kylin insights window to query it but it still
> > returned
> > > > >> previous number (5 million rows) not adding the last hour data of
> X
> > > rows
> > > > >> which I previously loaded from parquet into hive in step 2)
> > > > >>
> > > > >> Can you advise the way to make table refresh and updated?
> > > > >>
> > > > >> Thank you very much
> > > > >>
> > > >
> > >
> >
>

Re: How to reflect last hour data into Hive and Kylin Insights query window

Reply via email to