[
https://issues.apache.org/jira/browse/YUNIKORN-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833983#comment-17833983
]
Yongjun Zhang commented on YUNIKORN-2532:
-----------------------------------------
Hi [~ccondit],
Thanks much for the feedback.
I looked at the event stream approach and I agree that it is better approach
because it can survive across yunikorn restart and more. I think our company
can adapt to it in a longer term. Also it's good to know that using Any() is
not as efficient.
That said, given that the event stream approach will take time for us and our
ingestion pipeline is broken due to this change, I think a short term approach
is to change the output format to make it consistent with earlier version, but
avoid using Any(). Wonder if you would think it's ok to go with this approach
for now.
BTW, one thing I want to point out is, the old YK_APP_SUYMMARY was json format,
the 1.5 version makes it no longer json, and make the instance type string
appear multiple duplicate appearances. (See the two examples in the summary
section to see the diff).
One question about general Yunikorn coding guideline:
Though changing log message format is in general allowed and not considered
incompatible change, I added YK_APP_SUMMARY header in this log trying to make
it a special message that we could maintain its format fairly stably. My
question is, do we want to have some exception to enforce the format of certain
messages like this one, or we don't want to have any exceptions at all?
Thanks. FYI [~wilfreds] [~pbacsko]
> Resource usage report has an incompatible format change
> -------------------------------------------------------
>
> Key: YUNIKORN-2532
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2532
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Yongjun Zhang
> Priority: Major
>
> There is some recent change that caused the application resource usage report
> to have a new format:
> Prior the change, the format was:
> {code:java}
> YK_APP_SUMMARY: {"appID": "adf53ee0-experiment-organicad-94520240-1-1",
> "submissionTime": 1712169262131, "startTime": 1712169264134, "finishTime":
> 1712173619983, "user":
> "system:serviceaccount:spark-operator-02:spark-operator", "queue":
> "root.queue-large", "state": "Completed", "rmID": "test-cluster",
> "resourceUsage":
> {"abc":{"memory":139178200478515200,"pods":1729129,"vcore":5183062000},"def":{"memory":113789789798400,"pods":1413,"vcore":4239000}},
> "preemptedResource": {}}
> {code}
> with the change, the new format is:
> {code:java}
> 2024-04-04T00:33:08.532Z INFO core.scheduler.application.usage
> objects/application_summary.go:60 YK_APP_SUMMARY: {ApplicationID:
> afa303d0-test-trino-sparksql--20240404-2-1, SubmissionTime: 1712190615461,
> StartTime: 1712190617496, FinishTime: 1712190788532, User:
> system:serviceaccount:spark-operator-01:spark-operator, Queue:
> root.queue-large, State: Completed, RmID: test-cluster, ResourceUsage:
> TrackedResource{UNKNOWN:pods=177,UNKNOWN:vcore=354000,UNKNOWN:memory=1431454089216},
> PreemptedResource: TrackedResource{}, PlaceholderResource:
> TrackedResource{}}{code}
> There are several incompatibilities:
> 1. the class name TrackedResource was not there before, now it is.
> 2. the instance type was outside the resource part before, not it's embedded
> 3. the instance type was reported correctly before the change, now it's
> UNKNOWN
> #3 may be a different issue, but it's observed by us at the same time.
> I think what should change the format back to the original one, as this is an
> incompatible change. What do you think [~wilfreds] , [~pbacsko] ,[~ccondit] ?
> Thanks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]