Re: How to reflect last hour data into Hive and Kylin Insights query window

Xiaoxiang Yu Tue, 21 Nov 2023 23:10:47 -0800

Before you try again, you can use spark-sql/spark-shell to check if the
data is loaded
into your table successfully (or if your data is copied to the right place).
Following is how to start a spark-sql/spark-shell in a container.


export HADOOP_CONF_DIR=/opt/hadoop-3.2.1/etc/hadoop

cd /home/kylin/apache-kylin-5.0.0-beta-bin/spark

bin/spark-shell --executor-cores 1 --num-executors 1 --master yarn


The result of spark-sql/spark-shell should be the same as your
saw in Kylin insight page. If there are different results for the same
query,
which should not happen, please let me know.

Hope you can fix your problem soon.

------------------------
With warm regard
Xiaoxiang Yu



On Wed, Nov 22, 2023 at 11:59 AM Nam Đỗ Duy <[email protected]> wrote:

> Thank you Xiaoxiang, I tried in my place and it worked for the ssb database
> but it didn't work for my own database.
>
> It only works if I restart kylin so I guess there might be some
> configuration miss in my end.
>
> Thank you very much anyway and will update next time.
>
> Have a good day.
>
> On Fri, Nov 17, 2023 at 5:34 PM Xiaoxiang Yu <[email protected]> wrote:
>
> > I did an easy test to verify if kylin has any bugs for the push down
> > function. And the push
> > down function works as expected without any mistakes. So I'm 99% certain
> > that
> > your step "I loaded the incremental data into Hive already" does not
> work.
> >
> > Here are my steps(you can reproduce in a fresh Kylin5 docker container in
> > one minute) :
> >
> > 1. Query `select count(*) from SSB.DATES` in project ssb without building
> > any index.
> >     Query result(Answered By: HIVE) is :   2556
> >
> > 2. Duplicate the file of table `ssb.dates` by following command:
> >     hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv
> > /user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv
> >
> > 3. Re-query `select count(*) from SSB.DATES` in project ssb
> >     Query result(Answered By: HIVE) is :  5112
> >
> > So, it is clear that the second query incremental data can be found by
> the
> > Kylin query engine.
> >
> > Finally, to make good use of Kylin in real use cases, good knowledge of
> > Apache Spark
> > and Apache Hadoop is a must-to-have.
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Fri, Nov 17, 2023 at 5:52 PM Nam Đỗ Duy <[email protected]>
> wrote:
> >
> > > Have a nice weekend Xiaoxiang, and thank you for helping me to become a
> > > kylin's fan
> > >
> > > You are right I am not familiar with Kylin enough and have little
> > > background of the hadoop system so I will double check here carefully
> > > before
> > > future questions. However I did understand the following mechanism
> > > in quotes.
> > >
> > > ============quoted====================
> > >
> > > If incremental data is not loaded into Kylin, Kylin can still answer
> such
> > > queries by
> > > reading the original hive table, but the query is not accelerated.
> > >
> > > If incremental data is loaded into Kylin, Kylin can answer queries by
> > > reading the special Index/Cuboid files, and the query will be
> > accelerated.
> > >
> > > ============end====================
> > >
> > > I explain my previous question that was as follows:
> > >
> > > 1. I turned off this configuration kylin.query.cache-enabled (set =
> > false)
> > > 2. Restart Kylin
> > > 3. I loaded the incremental data into Hive already
> > > 4. Turn on Pushdown option to query Hive not model
> > > 5. In Kylin Insights window, I still cannot get the incremental data
> > (which
> > > has been in Hive already)
> > >
> > > That was the reason why I asked you: can I get the incremental result
> by
> > > above 5 steps (without model and index) or do I need to create model
> and
> > > index and segment then I can  get the incremental result by creating a
> > new
> > > segment according to incremental data?
> > >
> > > Hope you get my point or I will explain more
> > >
> > > Thank you very much again
> > >
> > >
> > > On Fri, 17 Nov 2023 at 16:00 Xiaoxiang Yu <[email protected]> wrote:
> > >
> > > > Unfortunately, I guess you are not asking good questions.
> > > > If the answer of a question can be searched on the Internet,
> > > > it is not recommended to ask it in the mailing list. I guess you
> > > > didn't know how Kylin works, so you need to search for documents
> > > >  or some tutorials.
> > > >
> > > > What does 'get the incremental data from Hive into Kylin' means?
> Kylin
> > > > fully relies
> > > > on Apache Spark for execution.
> > > >
> > > > If incremental data is not loaded into Kylin, Kylin can still answer
> > such
> > > > queries by
> > > > reading the original hive table, but the query is not accelerated.
> > > >
> > > > If incremental data is loaded into Kylin, Kylin can answer queries by
> > > > reading the special Index/Cuboid files, and the query will be
> > > accelerated.
> > > >
> > > >
> > > > ------------------------
> > > > With warm regard
> > > > Xiaoxiang Yu
> > > >
> > > >
> > > >
> > > > On Fri, Nov 17, 2023 at 4:36 PM Nam Đỗ Duy <[email protected]>
> > > wrote:
> > > >
> > > > > Hi Xiaoxiang,
> > > > >
> > > > > Do I really need to create a model in order to get the incremental
> > data
> > > > > from Hive into Kylin?
> > > > >
> > > > > Can I query the incremental data of a pure dim/fact table without a
> > > > model?
> > > > >
> > > > > Thank you very much
> > > > >
> > > > > On Fri, Nov 17, 2023 at 9:05 AM Xiaoxiang Yu <[email protected]>
> > wrote:
> > > > >
> > > > > > I am not really sure. But I think it is the Query cache make your
> > > query
> > > > > > result unchanged.
> > > > > >
> > > > > >
> > > > > > The config entry is kylin.query.cache-enabled , is turn on by
> > > default.
> > > > > > This doc links is
> > > > > > https://kylin.apache.org/5.0/docs/configuration/query_cache
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best wishes to you !
> > > > > > From ：Xiaoxiang Yu
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > At 2023-11-17 09:48:55, "Nam Đỗ Duy" <[email protected]>
> > wrote:
> > > > > > >Hello Team, hello Xiaoxiang, can you please help me with this
> > urgent
> > > > > > >issue...
> > > > > > >
> > > > > > >(this is public email group so in general I neglect your
> specific
> > > name
> > > > > > from
> > > > > > >greeting of first email in the threads, but in fact most of time
> > > > > Xiaoxiang
> > > > > > >actively answers my issues, thank you very much)
> > > > > > >
> > > > > > >On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy <[email protected]>
> > wrote:
> > > > > > >
> > > > > > >> Dear Dev Team, please kindly advise this scenario
> > > > > > >>
> > > > > > >> 1. I have a fact table and I use Kylin insights window to
> query
> > it
> > > > and
> > > > > > get
> > > > > > >> 5 million rows.
> > > > > > >>
> > > > > > >> 2. Then I use following command to load X rows (last hour
> data)
> > > from
> > > > > > >> parquet into Hive table
> > > > > > >>
> > > > > > >> LOAD DATA LOCAL INPATH
> > > > > > >> '/opt/LastHour/factUserEventDF_2023_11_16.parquet/14' INTO
> TABLE
> > > > > > >> factUserEvent;
> > > > > > >>
> > > > > > >> 3. Then I open Kylin insights window to query it but it still
> > > > returned
> > > > > > >> previous number (5 million rows) not adding the last hour data
> > of
> > > X
> > > > > rows
> > > > > > >> which I previously loaded from parquet into hive in step 2)
> > > > > > >>
> > > > > > >> Can you advise the way to make table refresh and updated?
> > > > > > >>
> > > > > > >> Thank you very much
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to reflect last hour data into Hive and Kylin Insights query window

Reply via email to