Colin created YUNIKORN-2785:
-------------------------------

             Summary: App summary resource usage is inaccurate if Yunikorn 
restarts
                 Key: YUNIKORN-2785
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2785
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: core - scheduler
            Reporter: Colin


My team needs to accurately track the resources used by Spark jobs. We're 
currently using YuniKorn's app summary log emitted by the scheduler when a job 
completes. However, we're aware that this log is inaccurate if YuniKorn 
restarts while the job is running since YuniKorn keeps track of app resources 
in memory only. To address this, we created a sidecar pod that connects to 
YuniKorn's streaming event endpoint and saves the events to a Kafka topic for 
persistence, should YuniKorn crash, allowing us to calculate our own app 
summaries.

However, we have noticed that if executor pods complete while YuniKorn is down, 
YuniKorn never emits an allocation cancellation event. Thus, we cannot 
determine when the executor pod stopped using resources. Using the last event 
timestamp from the job provides an upper bound on the executor's resource 
usage, and ignoring the executor entirely, as YuniKorn seems to do, provides a 
lower bound.

Below are the results from my testing:
h3. Test without YuniKorn Restart

I ran a job for about 5 minutes with a driver pod creating roughly 100 executor 
pods. The first execution was without restarting YuniKorn.


*My results calculated using the events in the Kafka topic:*
{code:java}
Total aggregated resources usage:
memory: 126643967751900.53
pods: 9102.207251182002
vcore: 35909809.17691601 {code}
*Yunikorn's App Summary Log:*
{code:java}
2024-07-30T23:04:58.526Z INFO core.scheduler.application.usage 
objects/application_summary.go:60 YK_APP_SUMMARY: {ResourceUsage: 
TrackedResource{UNKNOWN:pods=9048,UNKNOWN:vcore=35694000,UNKNOWN:memory=125880530632704},
 PreemptedResource: TrackedResource{}, PlaceholderResource: TrackedResource{}} 
{code}
*The difference (my value - yunikorn app summary value):*
{code:java}
memory: 126643967751900.53 - 125880530632704 = 125880530632704 (my value is 
0.60647% greater)

pods: 9102.207251182002 - 9048 = 54.207251182   (my value is 0.599% greater)

vcore: 35909809.17691601 - 35694000 = 215809.176916  (my value is 0.6046% 
greater) {code}
My value is slightly different because I'm using the event timestamps and not 
the resource timestamps (if you think it's something else then please share).

 
h3. Test with YuniKorn Restart

I then ran the same job but shut YuniKorn down for about 30 seconds after 
allocating resources to the driver and all executors, as the executors were 
nearing completion. Then, I restarted YuniKorn.

 

+_Ignoring pods without cancellation events_+

*My results calculated using the events in the Kafka topic:*
{code:java}
Total aggregated resources usage:
memory: 13299125469337.467
pods: 945.3453441859999
vcore: 3760461.7715400006{code}
*Yunikorn's App Summary Log:*
{code:java}
2024-07-30T23:48:41.044Z INFO core.scheduler.application.usage 
objects/application_summary.go:60 YK_APP_SUMMARY: {ResourceUsage: 
TrackedResource{UNKNOWN:memory=12561602838528,UNKNOWN:vcore=3552000,UNKNOWN:pods=893},
 PreemptedResource: TrackedResource{}, PlaceholderResource: 
TrackedResource{}}{code}
*The difference (my value - yunikorn app summary value):*
{code:java}
memory: 13299125469337.467 - 12561602838528 = 737522630809 (my value is 
5.87124% greater)

pods: 945.3453441859999 - 893 = 52.345344186   (my value is 5.8617% greater)

vcore: 3760461.7715400006 - 3552000 = 208461.77154  (my value is 5.8688% 
greater){code}
There's a larger discrepancy this time. Notably, the number of pods shows a 
significant drop. In typical runs without restarting YuniKorn, the job's pod 
resource usage hovers around 9k.

 
h3. Using Last Event Timestamp as a Replacement

When using the last event timestamp instead of the allocation cancellation 
event to calculate resource usage, the results align closer to expectations but 
remain significantly higher than YuniKorn's summary log, likely representing an 
overestimate.

 

*My results calculated using the events in the Kafka topic:*
{code:java}
Number of allocations without matching cancels: 101
Total aggregated resources usage:
memory: 159366239582373.12
pods: 11375.528615858
vcore: 45109670.74353799{code}
*The difference (my value - yunikorn app summary value):*
{code:java}
memory: 159366239582373.12 - 12561602838528 = 1.4680464e+14 (my value is 
1168.6776% greater)

pods: 11375.528615858 - 893 = 10482.5286159   (my value is 1173.85538% greater)

vcore: 45109670.74353799 - 3552000 = 41557670.7435  (my value is 1169.9794% 
greater){code}
 
h3. Conclusion and Inquiry

Is this a bug in YuniKorn? Besides logging events to a Kafka topic, are there 
other strategies my team can employ to improve resource usage tracking?

Any insights or recommendations would be greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to