[ 
https://issues.apache.org/jira/browse/HIVE-29081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004152#comment-18004152
 ] 

Stamatis Zampetakis commented on HIVE-29081:
--------------------------------------------

I gathered some disk usage statistics from 
[PR-5946|https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5946/2/pipeline]
 that is currently running and where split-12 failed due to the storage limit 
using the following script:

{code:bash}
node=hive-precommit-pr-5946-2-nlm1g-nckm6-tp8xd;while true; do date; kubectl 
exec $node -c hdb -- du /home/jenkins/agent/workspace/hive-precommit_PR-5946 -h 
--max-depth=1; kubectl exec $node -c hdb -- du / -h --max-depth=1; sleep 30s; 
done > pr5946-du.log
{code}

As it can be seen from  [^pr5946-du.log] just before the failure the hdb 
container was consuming in total 24G of space:

{noformat}
1.5G    /home/jenkins/agent/workspace/hive-precommit_PR-5946/packaging
3.8G    /home/jenkins/agent/workspace/hive-precommit_PR-5946/.m2
7.3G    /home/jenkins/agent/workspace/hive-precommit_PR-5946/itests
15G     /home/jenkins/agent/workspace/hive-precommit_PR-5946
15G     /home
4.4G    /usr
4.1G    /work
24G     /
{noformat}

I am pretty sure that what lies under the hive-precommit_PR-5946 directory 
(i.e., 15G) definitely accounts for ephemeral storage. I am less sure if  /work 
and /usr account for ephemeral storage but based on the current observations we 
definitely have to bump up the request/limit fields.

For the rest, it would be useful to understand what in itests folders amounts 
to 7GB (if its normal or regression) but we can do this after we stabilize CI.

> CI fails intermittently cause some runs exceed ephemeral storage limits
> -----------------------------------------------------------------------
>
>                 Key: HIVE-29081
>                 URL: https://issues.apache.org/jira/browse/HIVE-29081
>             Project: Hive
>          Issue Type: Bug
>          Components: Testing Infrastructure
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>         Attachments: pr5946-du.log
>
>
> The CI fails intermittently cause some runs/splits exceed the 20Gi limit on 
> ephemeral local storage that was set recently (HIVE-28954) and the respective 
> pods are being evicted.
> * https://ci.hive.apache.org/job/hive-precommit/job/master/2603/
> * https://ci.hive.apache.org/job/hive-precommit/job/master/2602/
> * https://ci.hive.apache.org/job/hive-precommit/job/master/2600/
> {noformat}
> Unable to create live FilePath for 
> hive-precommit-master-2603-3bz8w-bkgst-7wgqt; 
> hive-precommit-master-2603-3bz8w-bkgst-7wgqt was marked offline: Pod failed 
> (Reason: Evicted, Message: Pod ephemeral local storage usage exceeds the 
> total limit of containers 20Gi. )
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to