Hi,

    Yesterday, I tried to see if query pushdown functions work well in the
Kylin5 docker, and all of my queries return proper responses .
    After checking your logs from Shaofeng, I found these error messages
repeated many times:
    1. 'java.io.IOException: All datanodes DatanodeInfoWithStorage[
127.0.0.1:9866,DS-5093899b-06c7-4386-95d5-6fc271d92b52,DISK] are bad.
Aborting...'
    2. 'curator.ConnectionState : Connection timed out for connection
string (localhost:2181) and timeout (15000) / elapsed (41794)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
ConnectionLoss'

    I guess the root cause is that the container didn't not have enough
resources. I found you query on a table called
'XXX_hive_dwh_400million_rows', looks like you gave a complex query on a
table which contains 400 million rows?

    Since I am the uploader of kylin5 's docker image, I want to give some
explainment. Kylin5 docker is not a place for performance benchmarks, it is
only for demonstration. It is only allocated with very little resources(8G
memory) if you are using the default command from docker hub page. Before I
uploaded my image, I only tested my image using the ssb dataset, which the
biggest table only contains about 60k rows. If you are using a larger
dataset and complexer queries, you have to scale the resource properly. Try
querying tables which contain not more than 100k rows by default.

    Here are some tips which may help you to check if the daemon service is
in health status and resources(particularly disk space) is configured
properly.

    1. Checking HDFS 's web ui(
http://localhost:9870/dfshealth.html#tab-datanode ) to confirm whether
HDFS service is in 'In service' status.
    2. Checking Datanode 's log in
`/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log`, check if
there is any error message. Like: cat
/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log | grep ERROR
| wc -l
    3. Checking if your docker engine is configured with enough disk space,
if you are using Docker Desktop like me,please go to "Settings" -
"Resources" - "Advanced", make sure you have allocated 40GB+ disk space to
the docker container.
    4. Checking the available disk space of your container by `df -h`, make
sure the 'Use%' of 'overlay' is less than 60% .
    5. Checking the load average/ cpu usage/ jvm gc. Make sure these
metrics are not really high when you send a query.
------------------------
With warm regard
Xiaoxiang Yu



On Tue, Oct 31, 2023 at 5:13 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Hi ShaoFeng
>
> Thank you very much for your valuable feedback
>
> I saw the application to be there (if I see it right) as in the attachment
> photo. Kindly advise so that I can run this query on OLAP.
>
> PS. I sent you the log file in private.
>
> [image: image.png]
>
> On Tue, Oct 31, 2023 at 3:11 PM ShaoFeng Shi <shaofeng...@apache.org>
> wrote:
>
>> Can you provide the messages in logs/kylin.log when executing the SQL?
>> and you can also check the Spark UI from yarn resource manager (there
>> should be one running application called Spardar, which is Kylin's backend
>> spark application). If the application is not there, it may indicates the
>> yarn doesn't have resource to startup it.
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC,
>> Apache Incubator PMC,
>> Email: shaofeng...@apache.org
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscr...@kylin.apache.org
>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>>
>>
>>
>>
>> Nam Đỗ Duy <na...@vnpay.vn> 于2023年10月31日周二 10:35写道:
>>
>>> Dear Sir/Madam,
>>>
>>> I have a fact with 500million rows then I build model, index according
>>> to the website help.
>>>
>>> I chose full incremental because this is the first times I load data
>>>
>>> I create both index types Aggregate group index, table index as photo
>>> attached.
>>>
>>> But the query always failed after timeout of 300 seconds (I run in
>>> docker), I dont want to increase the value of 300 seconds because I wish
>>> the OLAP can run within 1 minutes (is that possible?)
>>>
>>> It seems that the OLAP function in indexing not working to speedup the
>>> query by precomputed cube.
>>>
>>> Can you advise to check whether the index did really work?
>>>
>>> It is quite urgent task for me so prompt response is highly appreciated.
>>>
>>> Thank you very much
>>>
>>

Reply via email to