Hi Xiaoxiang, Thank you very much
I have clearer picture of Kylin already thanks to your explanation. Now back to the sample project of SSB in attached photo, when I run this query with push_down option OFF, why the OLAP error appears, and in such case, how to create a new cube for this query? [image: image.png] On Wed, Nov 1, 2023 at 3:49 PM Xiaoxiang Yu <x...@apache.org> wrote: > Here is some of my explanation and it may not be perfect. > Segment in Kylin is part of model/cube pre-computed data, in most > cases, divided by date column. > > Here is some difference between Segment and Snapshot. > Segment, whose source data comes from one fact table joins some dimension > tables with 'specific date range', is 'precomputed', and will accelerate > complex query. > Snapshot, whose source data comes from one specific dimension table without > specific date range, is "not precomputed", and can join with segments at > runtime . > > - https://kylin.apache.org/5.0/docs/snapshot/snapshot_management > - > https://kylin.apache.org/5.0/docs/modeling/load_data/segment_operation_settings/intro > > ------------------------ > With warm regard > Xiaoxiang Yu > > > > On Wed, Nov 1, 2023 at 3:53 PM Nam Đỗ Duy <na...@vnpay.vn> wrote: > >> Thank you again, very smart of you to automatically select cube for a >> certain query. Sorry If I ask too much: Is the concept of Segment in Kylin >> model similar to Slice-and-Dice concept of Cube, what is the different >> between Kylin Segment and Kylin Snapshot? >> >> PS. I sent you the log files for your help in investigating why my cube >> has not been used. >> >> On Wed, Nov 1, 2023 at 2:36 PM Xiaoxiang Yu <x...@apache.org> wrote: >> >>> I guess there is a misunderstanding from your sentences. >>> >>> -- 'I need to select Cube from a combo box below the query window' >>> It is not right to use 'need', that combo box is for some specific >>> cases(for example, Kylin did not choose a cube which is the most >>> efficient), not the most cases. >>> In most cases(both for Kylin 4 and Kylin 5), you don't need to select a >>> Cube in the combo box, Kylin will do the choice for you. >>> >>> ------------------------ >>> With warm regard >>> Xiaoxiang Yu >>> >>> >>> >>> On Wed, Nov 1, 2023 at 3:24 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>> wrote: >>> >>>> Hi Xiaoxiang, sorry if I made you confused (Anyway, it is just a >>>> question of a beginner) >>>> >>>> "obviously" means "clearly" >>>> >>>> because I need to select Cube from a combo box below the query window >>>> >>>> Thank you very much >>>> >>>> On Wed, Nov 1, 2023 at 2:20 PM Xiaoxiang Yu <x...@apache.org> wrote: >>>> >>>>> From my side, I cannot understand why you say Kylin 4 is 'very >>>>> obviously'. Can you give an example? >>>>> From the source code, the basic logic of choosing the right cube/model >>>>> are similar. >>>>> ------------------------ >>>>> With warm regard >>>>> Xiaoxiang Yu >>>>> >>>>> >>>>> >>>>> On Wed, Nov 1, 2023 at 3:01 PM Nam Đỗ Duy <na...@vnpay.vn> wrote: >>>>> >>>>>> Thank you for your kind reply, please answer 1 more question about >>>>>> version 5: >>>>>> >>>>>> In version 4.x we run query against a Cube very obviously, but in >>>>>> version 5, the cube usage is a implication socan you advise: for a given >>>>>> query, which model will be used, which index (cube) will be used for this >>>>>> query? >>>>>> >>>>>> Thank you >>>>>> >>>>>> On Wed, Nov 1, 2023 at 1:42 PM Xiaoxiang Yu <x...@apache.org> wrote: >>>>>> >>>>>>> 1. How do I measure the size of the index (cube) in version 5? >>>>>>> You can check storage of specific Indexes from the Index page. >>>>>>> >>>>>>> https://kylin.apache.org/5.0/docs/modeling/model_design/aggregation_group#view-aggregate-index >>>>>>> or >>>>>>> https://kylin.apache.org/5.0/assets/images/index_1-6ad3f55183d4ed61962359d9408ba192.png >>>>>>> >>>>>>> >>>>>>> 2. How to create the cardinality for each column? >>>>>>> You should check this link : >>>>>>> https://kylin.apache.org/5.0/docs/datasource/data_sampling/ . >>>>>>> >>>>>>> 3. In your default project sample named SSB project, you have only 4 >>>>>>> simple aggregate group index and no table index as in attached file >>>>>>> so what is the best strategy to select index for our OLAP? >>>>>>> 1. There does exist a 'Base Table Index' by default actually, >>>>>>> its id is 20000000001. >>>>>>> 2. I think it is a good question and Kylin 5 lacks such a guide >>>>>>> for better modeling. You are free to ask your question to >>>>>>> mailing list and I will try to reply. >>>>>>> >>>>>>> ------------------------ >>>>>>> With warm regard >>>>>>> Xiaoxiang Yu >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Nov 1, 2023 at 2:12 PM Xiaoxiang Yu <x...@apache.org> wrote: >>>>>>> >>>>>>>> OK, I didn't read all the mail history so I misunderstand the >>>>>>>> situation. Looks like you need to analyse >>>>>>>> the cause why the query didn't hit the cube correctly. >>>>>>>> >>>>>>>> Please generate query diagnosis package and send it to me >>>>>>>> privately. I will analyse the query log. >>>>>>>> You can refer to the following steps in screenshots. >>>>>>>> >>>>>>>> [image: image.png] >>>>>>>> >>>>>>>> If the screenshots are not displaying correctly, please read this >>>>>>>> guide : >>>>>>>> >>>>>>>> https://kylin.apache.org/5.0/docs/operations/system-operation/diagnosis/#generate-query-diagnosis-package-in-web-ui >>>>>>>> >>>>>>>> By the way, you need to analyse the cause by reading >>>>>>>> kylin.query.log, not the kylin.log, >>>>>>>> refer to >>>>>>>> https://kylin.apache.org/5.0/docs/operations/logs/system_log >>>>>>>> >>>>>>>> ------------------------ >>>>>>>> With warm regard >>>>>>>> Xiaoxiang Yu >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Nov 1, 2023 at 12:18 PM Nam Đỗ Duy <na...@vnpay.vn> wrote: >>>>>>>> >>>>>>>>> Thank you Xiaoxiang for your advice. As my title email shown, I >>>>>>>>> guessed that the OLAP functionalities has not been correctly set up >>>>>>>>> in my >>>>>>>>> computer. >>>>>>>>> >>>>>>>>> The evidence about it is that: when I disable the Pushdown option >>>>>>>>> box to use solely the precomputation cube only, it showed following >>>>>>>>> error: >>>>>>>>> Please kindly advise how to properly build the OLAP >>>>>>>>> >>>>>>>>> LIMIT 500": No realization found for OLAPContext, >>>>>>>>> MODEL_UNMATCHED_JOIN, >>>>>>>>> rel#2240:KapTableScan.OLAP.[](table=[VNEVENT_HIVE_DWH_400MILLION_ROWS, >>>>>>>>> FACTUSEREVENT],ctx=0@null,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, >>>>>>>>> 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Nov 1, 2023 at 10:40 AM Xiaoxiang Yu <x...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Yesterday, I tried to see if query pushdown functions work >>>>>>>>>> well in the Kylin5 docker, and all of my queries return proper >>>>>>>>>> responses . >>>>>>>>>> After checking your logs from Shaofeng, I found these error >>>>>>>>>> messages repeated many times: >>>>>>>>>> 1. 'java.io.IOException: All datanodes >>>>>>>>>> DatanodeInfoWithStorage[127.0.0.1:9866,DS-5093899b-06c7-4386-95d5-6fc271d92b52,DISK] >>>>>>>>>> are bad. Aborting...' >>>>>>>>>> 2. 'curator.ConnectionState : Connection timed out for >>>>>>>>>> connection string (localhost:2181) and timeout (15000) / elapsed >>>>>>>>>> (41794) >>>>>>>>>> org.apache.curator.CuratorConnectionLossException: >>>>>>>>>> KeeperErrorCode = ConnectionLoss' >>>>>>>>>> >>>>>>>>>> I guess the root cause is that the container didn't not have >>>>>>>>>> enough resources. I found you query on a table called >>>>>>>>>> 'XXX_hive_dwh_400million_rows', looks like you gave a complex query >>>>>>>>>> on a >>>>>>>>>> table which contains 400 million rows? >>>>>>>>>> >>>>>>>>>> Since I am the uploader of kylin5 's docker image, I want to >>>>>>>>>> give some explainment. Kylin5 docker is not a place for performance >>>>>>>>>> benchmarks, it is only for demonstration. It is only allocated with >>>>>>>>>> very >>>>>>>>>> little resources(8G memory) if you are using the default command from >>>>>>>>>> docker hub page. Before I uploaded my image, I only tested my image >>>>>>>>>> using >>>>>>>>>> the ssb dataset, which the biggest table only contains about 60k >>>>>>>>>> rows. If >>>>>>>>>> you are using a larger dataset and complexer queries, you have to >>>>>>>>>> scale the >>>>>>>>>> resource properly. Try querying tables which contain not more than >>>>>>>>>> 100k >>>>>>>>>> rows by default. >>>>>>>>>> >>>>>>>>>> Here are some tips which may help you to check if the daemon >>>>>>>>>> service is in health status and resources(particularly disk space) is >>>>>>>>>> configured properly. >>>>>>>>>> >>>>>>>>>> 1. Checking HDFS 's web ui( >>>>>>>>>> http://localhost:9870/dfshealth.html#tab-datanode ) to confirm >>>>>>>>>> whether HDFS service is in 'In service' status. >>>>>>>>>> 2. Checking Datanode 's log in >>>>>>>>>> `/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log`, >>>>>>>>>> check if >>>>>>>>>> there is any error message. Like: cat >>>>>>>>>> /opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log | >>>>>>>>>> grep ERROR >>>>>>>>>> | wc -l >>>>>>>>>> 3. Checking if your docker engine is configured with enough >>>>>>>>>> disk space, if you are using Docker Desktop like me,please go to >>>>>>>>>> "Settings" >>>>>>>>>> - "Resources" - "Advanced", make sure you have allocated 40GB+ disk >>>>>>>>>> space >>>>>>>>>> to the docker container. >>>>>>>>>> 4. Checking the available disk space of your container by `df >>>>>>>>>> -h`, make sure the 'Use%' of 'overlay' is less than 60% . >>>>>>>>>> 5. Checking the load average/ cpu usage/ jvm gc. Make sure >>>>>>>>>> these metrics are not really high when you send a query. >>>>>>>>>> ------------------------ >>>>>>>>>> With warm regard >>>>>>>>>> Xiaoxiang Yu >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Oct 31, 2023 at 5:13 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi ShaoFeng >>>>>>>>>>> >>>>>>>>>>> Thank you very much for your valuable feedback >>>>>>>>>>> >>>>>>>>>>> I saw the application to be there (if I see it right) as in the >>>>>>>>>>> attachment photo. Kindly advise so that I can run this query on >>>>>>>>>>> OLAP. >>>>>>>>>>> >>>>>>>>>>> PS. I sent you the log file in private. >>>>>>>>>>> >>>>>>>>>>> [image: image.png] >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 31, 2023 at 3:11 PM ShaoFeng Shi < >>>>>>>>>>> shaofeng...@apache.org> wrote: >>>>>>>>>>> >>>>>>>>>>>> Can you provide the messages in logs/kylin.log when executing >>>>>>>>>>>> the SQL? and you can also check the Spark UI from yarn resource >>>>>>>>>>>> manager >>>>>>>>>>>> (there should be one running application called Spardar, which is >>>>>>>>>>>> Kylin's >>>>>>>>>>>> backend spark application). If the application is not there, it may >>>>>>>>>>>> indicates the yarn doesn't have resource to startup it. >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> >>>>>>>>>>>> Shaofeng Shi 史少锋 >>>>>>>>>>>> Apache Kylin PMC, >>>>>>>>>>>> Apache Incubator PMC, >>>>>>>>>>>> Email: shaofeng...@apache.org >>>>>>>>>>>> >>>>>>>>>>>> Apache Kylin FAQ: >>>>>>>>>>>> https://kylin.apache.org/docs/gettingstarted/faq.html >>>>>>>>>>>> Join Kylin user mail group: user-subscr...@kylin.apache.org >>>>>>>>>>>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Nam Đỗ Duy <na...@vnpay.vn> 于2023年10月31日周二 10:35写道: >>>>>>>>>>>> >>>>>>>>>>>>> Dear Sir/Madam, >>>>>>>>>>>>> >>>>>>>>>>>>> I have a fact with 500million rows then I build model, index >>>>>>>>>>>>> according to the website help. >>>>>>>>>>>>> >>>>>>>>>>>>> I chose full incremental because this is the first times I >>>>>>>>>>>>> load data >>>>>>>>>>>>> >>>>>>>>>>>>> I create both index types Aggregate group index, table index >>>>>>>>>>>>> as photo attached. >>>>>>>>>>>>> >>>>>>>>>>>>> But the query always failed after timeout of 300 seconds (I >>>>>>>>>>>>> run in docker), I dont want to increase the value of 300 seconds >>>>>>>>>>>>> because I >>>>>>>>>>>>> wish the OLAP can run within 1 minutes (is that possible?) >>>>>>>>>>>>> >>>>>>>>>>>>> It seems that the OLAP function in indexing not working to >>>>>>>>>>>>> speedup the query by precomputed cube. >>>>>>>>>>>>> >>>>>>>>>>>>> Can you advise to check whether the index did really work? >>>>>>>>>>>>> >>>>>>>>>>>>> It is quite urgent task for me so prompt response is highly >>>>>>>>>>>>> appreciated. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you very much >>>>>>>>>>>>> >>>>>>>>>>>>