I guess there is a misunderstanding from your sentences. -- 'I need to select Cube from a combo box below the query window' It is not right to use 'need', that combo box is for some specific cases(for example, Kylin did not choose a cube which is the most efficient), not the most cases. In most cases(both for Kylin 4 and Kylin 5), you don't need to select a Cube in the combo box, Kylin will do the choice for you.
------------------------ With warm regard Xiaoxiang Yu On Wed, Nov 1, 2023 at 3:24 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > Hi Xiaoxiang, sorry if I made you confused (Anyway, it is just a question > of a beginner) > > "obviously" means "clearly" > > because I need to select Cube from a combo box below the query window > > Thank you very much > > On Wed, Nov 1, 2023 at 2:20 PM Xiaoxiang Yu <x...@apache.org> wrote: > >> From my side, I cannot understand why you say Kylin 4 is 'very obviously'. >> Can you give an example? >> From the source code, the basic logic of choosing the right cube/model >> are similar. >> ------------------------ >> With warm regard >> Xiaoxiang Yu >> >> >> >> On Wed, Nov 1, 2023 at 3:01 PM Nam Đỗ Duy <na...@vnpay.vn> wrote: >> >>> Thank you for your kind reply, please answer 1 more question about >>> version 5: >>> >>> In version 4.x we run query against a Cube very obviously, but in >>> version 5, the cube usage is a implication socan you advise: for a given >>> query, which model will be used, which index (cube) will be used for this >>> query? >>> >>> Thank you >>> >>> On Wed, Nov 1, 2023 at 1:42 PM Xiaoxiang Yu <x...@apache.org> wrote: >>> >>>> 1. How do I measure the size of the index (cube) in version 5? >>>> You can check storage of specific Indexes from the Index page. >>>> >>>> https://kylin.apache.org/5.0/docs/modeling/model_design/aggregation_group#view-aggregate-index >>>> or >>>> https://kylin.apache.org/5.0/assets/images/index_1-6ad3f55183d4ed61962359d9408ba192.png >>>> >>>> >>>> 2. How to create the cardinality for each column? >>>> You should check this link : >>>> https://kylin.apache.org/5.0/docs/datasource/data_sampling/ . >>>> >>>> 3. In your default project sample named SSB project, you have only 4 >>>> simple aggregate group index and no table index as in attached file >>>> so what is the best strategy to select index for our OLAP? >>>> 1. There does exist a 'Base Table Index' by default actually, its >>>> id is 20000000001. >>>> 2. I think it is a good question and Kylin 5 lacks such a guide for >>>> better modeling. You are free to ask your question to >>>> mailing list and I will try to reply. >>>> >>>> ------------------------ >>>> With warm regard >>>> Xiaoxiang Yu >>>> >>>> >>>> >>>> On Wed, Nov 1, 2023 at 2:12 PM Xiaoxiang Yu <x...@apache.org> wrote: >>>> >>>>> OK, I didn't read all the mail history so I misunderstand the >>>>> situation. Looks like you need to analyse >>>>> the cause why the query didn't hit the cube correctly. >>>>> >>>>> Please generate query diagnosis package and send it to me privately. I >>>>> will analyse the query log. >>>>> You can refer to the following steps in screenshots. >>>>> >>>>> [image: image.png] >>>>> >>>>> If the screenshots are not displaying correctly, please read this >>>>> guide : >>>>> >>>>> https://kylin.apache.org/5.0/docs/operations/system-operation/diagnosis/#generate-query-diagnosis-package-in-web-ui >>>>> >>>>> By the way, you need to analyse the cause by reading kylin.query.log, >>>>> not the kylin.log, >>>>> refer to https://kylin.apache.org/5.0/docs/operations/logs/system_log >>>>> >>>>> ------------------------ >>>>> With warm regard >>>>> Xiaoxiang Yu >>>>> >>>>> >>>>> >>>>> On Wed, Nov 1, 2023 at 12:18 PM Nam Đỗ Duy <na...@vnpay.vn> wrote: >>>>> >>>>>> Thank you Xiaoxiang for your advice. As my title email shown, I >>>>>> guessed that the OLAP functionalities has not been correctly set up in my >>>>>> computer. >>>>>> >>>>>> The evidence about it is that: when I disable the Pushdown option box >>>>>> to use solely the precomputation cube only, it showed following error: >>>>>> Please kindly advise how to properly build the OLAP >>>>>> >>>>>> LIMIT 500": No realization found for OLAPContext, MODEL_UNMATCHED_JOIN, >>>>>> rel#2240:KapTableScan.OLAP.[](table=[VNEVENT_HIVE_DWH_400MILLION_ROWS, >>>>>> FACTUSEREVENT],ctx=0@null,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, >>>>>> 12, 13, 14, 15, 16, 17, 18, 19, 20]) >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Nov 1, 2023 at 10:40 AM Xiaoxiang Yu <x...@apache.org> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Yesterday, I tried to see if query pushdown functions work well >>>>>>> in the Kylin5 docker, and all of my queries return proper responses . >>>>>>> After checking your logs from Shaofeng, I found these error >>>>>>> messages repeated many times: >>>>>>> 1. 'java.io.IOException: All datanodes DatanodeInfoWithStorage[ >>>>>>> 127.0.0.1:9866,DS-5093899b-06c7-4386-95d5-6fc271d92b52,DISK] are >>>>>>> bad. Aborting...' >>>>>>> 2. 'curator.ConnectionState : Connection timed out for >>>>>>> connection string (localhost:2181) and timeout (15000) / elapsed (41794) >>>>>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = >>>>>>> ConnectionLoss' >>>>>>> >>>>>>> I guess the root cause is that the container didn't not have >>>>>>> enough resources. I found you query on a table called >>>>>>> 'XXX_hive_dwh_400million_rows', looks like you gave a complex query on a >>>>>>> table which contains 400 million rows? >>>>>>> >>>>>>> Since I am the uploader of kylin5 's docker image, I want to >>>>>>> give some explainment. Kylin5 docker is not a place for performance >>>>>>> benchmarks, it is only for demonstration. It is only allocated with very >>>>>>> little resources(8G memory) if you are using the default command from >>>>>>> docker hub page. Before I uploaded my image, I only tested my image >>>>>>> using >>>>>>> the ssb dataset, which the biggest table only contains about 60k rows. >>>>>>> If >>>>>>> you are using a larger dataset and complexer queries, you have to scale >>>>>>> the >>>>>>> resource properly. Try querying tables which contain not more than 100k >>>>>>> rows by default. >>>>>>> >>>>>>> Here are some tips which may help you to check if the daemon >>>>>>> service is in health status and resources(particularly disk space) is >>>>>>> configured properly. >>>>>>> >>>>>>> 1. Checking HDFS 's web ui( >>>>>>> http://localhost:9870/dfshealth.html#tab-datanode ) to confirm >>>>>>> whether HDFS service is in 'In service' status. >>>>>>> 2. Checking Datanode 's log in >>>>>>> `/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log`, check >>>>>>> if >>>>>>> there is any error message. Like: cat >>>>>>> /opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log | grep >>>>>>> ERROR >>>>>>> | wc -l >>>>>>> 3. Checking if your docker engine is configured with enough disk >>>>>>> space, if you are using Docker Desktop like me,please go to "Settings" - >>>>>>> "Resources" - "Advanced", make sure you have allocated 40GB+ disk space >>>>>>> to >>>>>>> the docker container. >>>>>>> 4. Checking the available disk space of your container by `df >>>>>>> -h`, make sure the 'Use%' of 'overlay' is less than 60% . >>>>>>> 5. Checking the load average/ cpu usage/ jvm gc. Make sure these >>>>>>> metrics are not really high when you send a query. >>>>>>> ------------------------ >>>>>>> With warm regard >>>>>>> Xiaoxiang Yu >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Oct 31, 2023 at 5:13 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi ShaoFeng >>>>>>>> >>>>>>>> Thank you very much for your valuable feedback >>>>>>>> >>>>>>>> I saw the application to be there (if I see it right) as in the >>>>>>>> attachment photo. Kindly advise so that I can run this query on OLAP. >>>>>>>> >>>>>>>> PS. I sent you the log file in private. >>>>>>>> >>>>>>>> [image: image.png] >>>>>>>> >>>>>>>> On Tue, Oct 31, 2023 at 3:11 PM ShaoFeng Shi < >>>>>>>> shaofeng...@apache.org> wrote: >>>>>>>> >>>>>>>>> Can you provide the messages in logs/kylin.log when executing the >>>>>>>>> SQL? and you can also check the Spark UI from yarn resource manager >>>>>>>>> (there >>>>>>>>> should be one running application called Spardar, which is Kylin's >>>>>>>>> backend >>>>>>>>> spark application). If the application is not there, it may indicates >>>>>>>>> the >>>>>>>>> yarn doesn't have resource to startup it. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> >>>>>>>>> Shaofeng Shi 史少锋 >>>>>>>>> Apache Kylin PMC, >>>>>>>>> Apache Incubator PMC, >>>>>>>>> Email: shaofeng...@apache.org >>>>>>>>> >>>>>>>>> Apache Kylin FAQ: >>>>>>>>> https://kylin.apache.org/docs/gettingstarted/faq.html >>>>>>>>> Join Kylin user mail group: user-subscr...@kylin.apache.org >>>>>>>>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Nam Đỗ Duy <na...@vnpay.vn> 于2023年10月31日周二 10:35写道: >>>>>>>>> >>>>>>>>>> Dear Sir/Madam, >>>>>>>>>> >>>>>>>>>> I have a fact with 500million rows then I build model, index >>>>>>>>>> according to the website help. >>>>>>>>>> >>>>>>>>>> I chose full incremental because this is the first times I load >>>>>>>>>> data >>>>>>>>>> >>>>>>>>>> I create both index types Aggregate group index, table index as >>>>>>>>>> photo attached. >>>>>>>>>> >>>>>>>>>> But the query always failed after timeout of 300 seconds (I run >>>>>>>>>> in docker), I dont want to increase the value of 300 seconds because >>>>>>>>>> I wish >>>>>>>>>> the OLAP can run within 1 minutes (is that possible?) >>>>>>>>>> >>>>>>>>>> It seems that the OLAP function in indexing not working to >>>>>>>>>> speedup the query by precomputed cube. >>>>>>>>>> >>>>>>>>>> Can you advise to check whether the index did really work? >>>>>>>>>> >>>>>>>>>> It is quite urgent task for me so prompt response is highly >>>>>>>>>> appreciated. >>>>>>>>>> >>>>>>>>>> Thank you very much >>>>>>>>>> >>>>>>>>>