[ 
https://issues.apache.org/jira/browse/YUNIKORN-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit resolved YUNIKORN-2785.
------------------------------------
    Resolution: Not A Bug

> App summary resource usage is inaccurate if Yunikorn restarts
> -------------------------------------------------------------
>
>                 Key: YUNIKORN-2785
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2785
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Colin
>            Priority: Minor
>              Labels: events, resource
>
> My team needs to accurately track the resources used by Spark jobs. We're 
> currently using YuniKorn's app summary log emitted by the scheduler when a 
> job completes. However, we're aware that this log is inaccurate if YuniKorn 
> restarts while the job is running since YuniKorn keeps track of app resources 
> in memory only. To address this, we created a sidecar pod that connects to 
> YuniKorn's streaming event endpoint and saves the events to a Kafka topic for 
> persistence, should YuniKorn crash, allowing us to calculate our own app 
> summaries.
> However, we have noticed that if executor pods complete while YuniKorn is 
> down, YuniKorn never emits an allocation cancellation event. Thus, we cannot 
> determine when the executor pod stopped using resources. Using the last event 
> timestamp from the job provides an upper bound on the executor's resource 
> usage, and ignoring the executor entirely, as YuniKorn seems to do, provides 
> a lower bound.
> Below are the results from my testing:
> h3. Test without YuniKorn Restart
> I ran a job for about 5 minutes with a driver pod creating roughly 100 
> executor pods. The first execution was without restarting YuniKorn.
> *My results calculated using the events in the Kafka topic:*
> {code:java}
> Total aggregated resources usage:
> memory: 126643967751900.53
> pods: 9102.207251182002
> vcore: 35909809.17691601 {code}
> *Yunikorn's App Summary Log:*
> {code:java}
> 2024-07-30T23:04:58.526Z INFO core.scheduler.application.usage 
> objects/application_summary.go:60 YK_APP_SUMMARY: {ResourceUsage: 
> TrackedResource{UNKNOWN:pods=9048,UNKNOWN:vcore=35694000,UNKNOWN:memory=125880530632704},
>  PreemptedResource: TrackedResource{}, PlaceholderResource: 
> TrackedResource{}} {code}
> *The difference (my value - yunikorn app summary value):*
> {code:java}
> memory: 126643967751900.53 - 125880530632704 = 125880530632704 (my value is 
> 0.60647% greater)
> pods: 9102.207251182002 - 9048 = 54.207251182   (my value is 0.599% greater)
> vcore: 35909809.17691601 - 35694000 = 215809.176916  (my value is 0.6046% 
> greater) {code}
> My value is slightly different because I'm using the event timestamps and not 
> the resource timestamps (if you think it's something else then please share).
>  
> h3. Test with YuniKorn Restart
> I then ran the same job but shut YuniKorn down for about 30 seconds after 
> allocating resources to the driver and all executors, as the executors were 
> nearing completion. Then, I restarted YuniKorn.
>  
> +_Ignoring pods without cancellation events_+
> *My results calculated using the events in the Kafka topic:*
> {code:java}
> Total aggregated resources usage:
> memory: 13299125469337.467
> pods: 945.3453441859999
> vcore: 3760461.7715400006{code}
> *Yunikorn's App Summary Log:*
> {code:java}
> 2024-07-30T23:48:41.044Z INFO core.scheduler.application.usage 
> objects/application_summary.go:60 YK_APP_SUMMARY: {ResourceUsage: 
> TrackedResource{UNKNOWN:memory=12561602838528,UNKNOWN:vcore=3552000,UNKNOWN:pods=893},
>  PreemptedResource: TrackedResource{}, PlaceholderResource: 
> TrackedResource{}}{code}
> *The difference (my value - yunikorn app summary value):*
> {code:java}
> memory: 13299125469337.467 - 12561602838528 = 737522630809 (my value is 
> 5.87124% greater)
> pods: 945.3453441859999 - 893 = 52.345344186   (my value is 5.8617% greater)
> vcore: 3760461.7715400006 - 3552000 = 208461.77154  (my value is 5.8688% 
> greater){code}
> There's a larger discrepancy this time. Notably, the number of pods shows a 
> significant drop. In typical runs without restarting YuniKorn, the job's pod 
> resource usage hovers around 9k.
>  
> h3. Using Last Event Timestamp as a Replacement
> When using the last event timestamp instead of the allocation cancellation 
> event to calculate resource usage, the results align closer to expectations 
> but remain significantly higher than YuniKorn's summary log, likely 
> representing an overestimate.
>  
> *My results calculated using the events in the Kafka topic:*
> {code:java}
> Number of allocations without matching cancels: 101
> Total aggregated resources usage:
> memory: 159366239582373.12
> pods: 11375.528615858
> vcore: 45109670.74353799{code}
> *The difference (my value - yunikorn app summary value):*
> {code:java}
> memory: 159366239582373.12 - 12561602838528 = 1.4680464e+14 (my value is 
> 1168.6776% greater)
> pods: 11375.528615858 - 893 = 10482.5286159   (my value is 1173.85538% 
> greater)
> vcore: 45109670.74353799 - 3552000 = 41557670.7435  (my value is 1169.9794% 
> greater){code}
>  
> h3. Conclusion and Inquiry
> Is this a bug in YuniKorn? Besides logging events to a Kafka topic, are there 
> other strategies my team can employ to improve resource usage tracking?
> Any insights or recommendations would be greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to