Re: How to reflect last hour data into Hive and Kylin Insights query window

Nam Đỗ Duy Wed, 22 Nov 2023 01:36:57 -0800

Thank you Xiaoxiang, tomorrow noon is my presentation to the management
about kylin so I am pending this issue to focus on following ones, can you
please advise:


1. How to build a metric repository by Kylin to share among data teams (DA,
DS, AI), is that the usage of measure in Kylin?
2. How to use Kylin for the Customer segmentation of Marketing dept?


On Wed, Nov 22, 2023 at 2:10 PM Xiaoxiang Yu <[email protected]> wrote:

> Before you try again, you can use spark-sql/spark-shell to check if the
> data is loaded
> into your table successfully (or if your data is copied to the right
> place).
> Following is how to start a spark-sql/spark-shell in a container.
>
> export HADOOP_CONF_DIR=/opt/hadoop-3.2.1/etc/hadoop
>
> cd /home/kylin/apache-kylin-5.0.0-beta-bin/spark
>
> bin/spark-shell --executor-cores 1 --num-executors 1 --master yarn
>
>
> The result of spark-sql/spark-shell should be the same as your
> saw in Kylin insight page. If there are different results for the same
> query,
> which should not happen, please let me know.
>
> Hope you can fix your problem soon.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Wed, Nov 22, 2023 at 11:59 AM Nam Đỗ Duy <[email protected]>
> wrote:
>
> > Thank you Xiaoxiang, I tried in my place and it worked for the ssb
> database
> > but it didn't work for my own database.
> >
> > It only works if I restart kylin so I guess there might be some
> > configuration miss in my end.
> >
> > Thank you very much anyway and will update next time.
> >
> > Have a good day.
> >
> > On Fri, Nov 17, 2023 at 5:34 PM Xiaoxiang Yu <[email protected]> wrote:
> >
> > > I did an easy test to verify if kylin has any bugs for the push down
> > > function. And the push
> > > down function works as expected without any mistakes. So I'm 99%
> certain
> > > that
> > > your step "I loaded the incremental data into Hive already" does not
> > work.
> > >
> > > Here are my steps(you can reproduce in a fresh Kylin5 docker container
> in
> > > one minute) :
> > >
> > > 1. Query `select count(*) from SSB.DATES` in project ssb without
> building
> > > any index.
> > >     Query result(Answered By: HIVE) is :   2556
> > >
> > > 2. Duplicate the file of table `ssb.dates` by following command:
> > >     hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv
> > > /user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv
> > >
> > > 3. Re-query `select count(*) from SSB.DATES` in project ssb
> > >     Query result(Answered By: HIVE) is :  5112
> > >
> > > So, it is clear that the second query incremental data can be found by
> > the
> > > Kylin query engine.
> > >
> > > Finally, to make good use of Kylin in real use cases, good knowledge of
> > > Apache Spark
> > > and Apache Hadoop is a must-to-have.
> > >
> > > ------------------------
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Fri, Nov 17, 2023 at 5:52 PM Nam Đỗ Duy <[email protected]>
> > wrote:
> > >
> > > > Have a nice weekend Xiaoxiang, and thank you for helping me to
> become a
> > > > kylin's fan
> > > >
> > > > You are right I am not familiar with Kylin enough and have little
> > > > background of the hadoop system so I will double check here carefully
> > > > before
> > > > future questions. However I did understand the following mechanism
> > > > in quotes.
> > > >
> > > > ============quoted====================
> > > >
> > > > If incremental data is not loaded into Kylin, Kylin can still answer
> > such
> > > > queries by
> > > > reading the original hive table, but the query is not accelerated.
> > > >
> > > > If incremental data is loaded into Kylin, Kylin can answer queries by
> > > > reading the special Index/Cuboid files, and the query will be
> > > accelerated.
> > > >
> > > > ============end====================
> > > >
> > > > I explain my previous question that was as follows:
> > > >
> > > > 1. I turned off this configuration kylin.query.cache-enabled (set =
> > > false)
> > > > 2. Restart Kylin
> > > > 3. I loaded the incremental data into Hive already
> > > > 4. Turn on Pushdown option to query Hive not model
> > > > 5. In Kylin Insights window, I still cannot get the incremental data
> > > (which
> > > > has been in Hive already)
> > > >
> > > > That was the reason why I asked you: can I get the incremental result
> > by
> > > > above 5 steps (without model and index) or do I need to create model
> > and
> > > > index and segment then I can  get the incremental result by creating
> a
> > > new
> > > > segment according to incremental data?
> > > >
> > > > Hope you get my point or I will explain more
> > > >
> > > > Thank you very much again
> > > >
> > > >
> > > > On Fri, 17 Nov 2023 at 16:00 Xiaoxiang Yu <[email protected]> wrote:
> > > >
> > > > > Unfortunately, I guess you are not asking good questions.
> > > > > If the answer of a question can be searched on the Internet,
> > > > > it is not recommended to ask it in the mailing list. I guess you
> > > > > didn't know how Kylin works, so you need to search for documents
> > > > >  or some tutorials.
> > > > >
> > > > > What does 'get the incremental data from Hive into Kylin' means?
> > Kylin
> > > > > fully relies
> > > > > on Apache Spark for execution.
> > > > >
> > > > > If incremental data is not loaded into Kylin, Kylin can still
> answer
> > > such
> > > > > queries by
> > > > > reading the original hive table, but the query is not accelerated.
> > > > >
> > > > > If incremental data is loaded into Kylin, Kylin can answer queries
> by
> > > > > reading the special Index/Cuboid files, and the query will be
> > > > accelerated.
> > > > >
> > > > >
> > > > > ------------------------
> > > > > With warm regard
> > > > > Xiaoxiang Yu
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Nov 17, 2023 at 4:36 PM Nam Đỗ Duy <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > Hi Xiaoxiang,
> > > > > >
> > > > > > Do I really need to create a model in order to get the
> incremental
> > > data
> > > > > > from Hive into Kylin?
> > > > > >
> > > > > > Can I query the incremental data of a pure dim/fact table
> without a
> > > > > model?
> > > > > >
> > > > > > Thank you very much
> > > > > >
> > > > > > On Fri, Nov 17, 2023 at 9:05 AM Xiaoxiang Yu <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > I am not really sure. But I think it is the Query cache make
> your
> > > > query
> > > > > > > result unchanged.
> > > > > > >
> > > > > > >
> > > > > > > The config entry is kylin.query.cache-enabled , is turn on by
> > > > default.
> > > > > > > This doc links is
> > > > > > > https://kylin.apache.org/5.0/docs/configuration/query_cache
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best wishes to you !
> > > > > > > From ：Xiaoxiang Yu
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > At 2023-11-17 09:48:55, "Nam Đỗ Duy" <[email protected]>
> > > wrote:
> > > > > > > >Hello Team, hello Xiaoxiang, can you please help me with this
> > > urgent
> > > > > > > >issue...
> > > > > > > >
> > > > > > > >(this is public email group so in general I neglect your
> > specific
> > > > name
> > > > > > > from
> > > > > > > >greeting of first email in the threads, but in fact most of
> time
> > > > > > Xiaoxiang
> > > > > > > >actively answers my issues, thank you very much)
> > > > > > > >
> > > > > > > >On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy <[email protected]>
> > > wrote:
> > > > > > > >
> > > > > > > >> Dear Dev Team, please kindly advise this scenario
> > > > > > > >>
> > > > > > > >> 1. I have a fact table and I use Kylin insights window to
> > query
> > > it
> > > > > and
> > > > > > > get
> > > > > > > >> 5 million rows.
> > > > > > > >>
> > > > > > > >> 2. Then I use following command to load X rows (last hour
> > data)
> > > > from
> > > > > > > >> parquet into Hive table
> > > > > > > >>
> > > > > > > >> LOAD DATA LOCAL INPATH
> > > > > > > >> '/opt/LastHour/factUserEventDF_2023_11_16.parquet/14' INTO
> > TABLE
> > > > > > > >> factUserEvent;
> > > > > > > >>
> > > > > > > >> 3. Then I open Kylin insights window to query it but it
> still
> > > > > returned
> > > > > > > >> previous number (5 million rows) not adding the last hour
> data
> > > of
> > > > X
> > > > > > rows
> > > > > > > >> which I previously loaded from parquet into hive in step 2)
> > > > > > > >>
> > > > > > > >> Can you advise the way to make table refresh and updated?
> > > > > > > >>
> > > > > > > >> Thank you very much
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to reflect last hour data into Hive and Kylin Insights query window

Reply via email to