Colin created YUNIKORN-2785:
-------------------------------
Summary: App summary resource usage is inaccurate if Yunikorn
restarts
Key: YUNIKORN-2785
URL: https://issues.apache.org/jira/browse/YUNIKORN-2785
Project: Apache YuniKorn
Issue Type: Bug
Components: core - scheduler
Reporter: Colin
My team needs to accurately track the resources used by Spark jobs. We're
currently using YuniKorn's app summary log emitted by the scheduler when a job
completes. However, we're aware that this log is inaccurate if YuniKorn
restarts while the job is running since YuniKorn keeps track of app resources
in memory only. To address this, we created a sidecar pod that connects to
YuniKorn's streaming event endpoint and saves the events to a Kafka topic for
persistence, should YuniKorn crash, allowing us to calculate our own app
summaries.
However, we have noticed that if executor pods complete while YuniKorn is down,
YuniKorn never emits an allocation cancellation event. Thus, we cannot
determine when the executor pod stopped using resources. Using the last event
timestamp from the job provides an upper bound on the executor's resource
usage, and ignoring the executor entirely, as YuniKorn seems to do, provides a
lower bound.
Below are the results from my testing:
h3. Test without YuniKorn Restart
I ran a job for about 5 minutes with a driver pod creating roughly 100 executor
pods. The first execution was without restarting YuniKorn.
*My results calculated using the events in the Kafka topic:*
{code:java}
Total aggregated resources usage:
memory: 126643967751900.53
pods: 9102.207251182002
vcore: 35909809.17691601 {code}
*Yunikorn's App Summary Log:*
{code:java}
2024-07-30T23:04:58.526Z INFO core.scheduler.application.usage
objects/application_summary.go:60 YK_APP_SUMMARY: {ResourceUsage:
TrackedResource{UNKNOWN:pods=9048,UNKNOWN:vcore=35694000,UNKNOWN:memory=125880530632704},
PreemptedResource: TrackedResource{}, PlaceholderResource: TrackedResource{}}
{code}
*The difference (my value - yunikorn app summary value):*
{code:java}
memory: 126643967751900.53 - 125880530632704 = 125880530632704 (my value is
0.60647% greater)
pods: 9102.207251182002 - 9048 = 54.207251182 (my value is 0.599% greater)
vcore: 35909809.17691601 - 35694000 = 215809.176916 (my value is 0.6046%
greater) {code}
My value is slightly different because I'm using the event timestamps and not
the resource timestamps (if you think it's something else then please share).
h3. Test with YuniKorn Restart
I then ran the same job but shut YuniKorn down for about 30 seconds after
allocating resources to the driver and all executors, as the executors were
nearing completion. Then, I restarted YuniKorn.
+_Ignoring pods without cancellation events_+
*My results calculated using the events in the Kafka topic:*
{code:java}
Total aggregated resources usage:
memory: 13299125469337.467
pods: 945.3453441859999
vcore: 3760461.7715400006{code}
*Yunikorn's App Summary Log:*
{code:java}
2024-07-30T23:48:41.044Z INFO core.scheduler.application.usage
objects/application_summary.go:60 YK_APP_SUMMARY: {ResourceUsage:
TrackedResource{UNKNOWN:memory=12561602838528,UNKNOWN:vcore=3552000,UNKNOWN:pods=893},
PreemptedResource: TrackedResource{}, PlaceholderResource:
TrackedResource{}}{code}
*The difference (my value - yunikorn app summary value):*
{code:java}
memory: 13299125469337.467 - 12561602838528 = 737522630809 (my value is
5.87124% greater)
pods: 945.3453441859999 - 893 = 52.345344186 (my value is 5.8617% greater)
vcore: 3760461.7715400006 - 3552000 = 208461.77154 (my value is 5.8688%
greater){code}
There's a larger discrepancy this time. Notably, the number of pods shows a
significant drop. In typical runs without restarting YuniKorn, the job's pod
resource usage hovers around 9k.
h3. Using Last Event Timestamp as a Replacement
When using the last event timestamp instead of the allocation cancellation
event to calculate resource usage, the results align closer to expectations but
remain significantly higher than YuniKorn's summary log, likely representing an
overestimate.
*My results calculated using the events in the Kafka topic:*
{code:java}
Number of allocations without matching cancels: 101
Total aggregated resources usage:
memory: 159366239582373.12
pods: 11375.528615858
vcore: 45109670.74353799{code}
*The difference (my value - yunikorn app summary value):*
{code:java}
memory: 159366239582373.12 - 12561602838528 = 1.4680464e+14 (my value is
1168.6776% greater)
pods: 11375.528615858 - 893 = 10482.5286159 (my value is 1173.85538% greater)
vcore: 45109670.74353799 - 3552000 = 41557670.7435 (my value is 1169.9794%
greater){code}
h3. Conclusion and Inquiry
Is this a bug in YuniKorn? Besides logging events to a Kafka topic, are there
other strategies my team can employ to improve resource usage tracking?
Any insights or recommendations would be greatly appreciated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]