Re: [DISCUSS] FLIP-241: Completed Jobs Information Enhancement

Yangze Guo Thu, 16 Jun 2022 23:04:22 -0700

Thanks for the input, Jiangang.

I think it's a valid demand to distinguish completed jobs with the same name.
- If they are different jobs, I think users need to give them
different meaningful names respectively.
- If they are exactly the same job, IIUC, what you need is to figure
out the order. ApplicationId in Yarn might help. But in this case, you
can just sort them with the start time.


Best,
Yangze Guo

On Fri, Jun 17, 2022 at 12:13 PM Jiangang Liu <[email protected]> wrote:
>
> Thanks for the FLIP. It is helpful to track detail infos for completed jobs.
>
> I want to ask another question. In our environment, sometimes it is hard to
> distinguish jobs since the same job names may appear multi times in the
> completed jobs. Because a job may run multi times or different jobs have
> the same job names. I wonder that wether we can enhance the complete jobs
> display with more information, such as applicationId and application name
> in yarn. Maybe it is different in k8s to identify a job.
>
> Best
> Jiangang Liu
>
> Yangze Guo <[email protected]> 于2022年6月17日周五 11:40写道：
>
> > Thanks for the feedback, Aitozi and Jing.
> >
> > > Are each attempts of the TaskManager or JobManager pods (if failure
> > occurs)
> > all be shown in the ui?
> >
> > The info of the prior execution attempts will be archived, you could
> > refer to `ArchivedExecutionVertex$priorExecutions`.
> >
> > > It seems that most of these metrics are more interesting to batch jobs.
> > Does it make sense to calculate them for pure streaming jobs too?
> >
> > All the proposed metrics will be calculated no matter what the job type is.
> >
> > > Why "duration is less interesting" which is mentioned in the FLIP?
> >
> > As a first step, we mainly focus on the most interesting status during
> > the job lifecycle. The duration of final states like FINISHED and
> > CANCELED is meaningless, while abnormal conditions like CANCELING will
> > not be included at the moment.
> >
> > > Could you share your thoughts on "accumulated-busy-time"? It should
> > describe the time while the task is working as expected, i.e. the happy
> > path. When do we need it for analytics or diagnosis?
> >
> > A task could be busy or idle while it is working. Users may adjust the
> > parallelism or the partition key according to the ratio between them.
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Jun 17, 2022 at 5:08 AM Jing Ge <[email protected]> wrote:
> > >
> > > Hi Junhan
> > >
> > > These are must-to-have information for batch processing. Thanks for
> > > bringing it up.
> > >
> > > I have some comments:
> > >
> > > 1. It seems that most of these metrics are more interesting to batch
> > jobs.
> > > Does it make sense to calculate them for pure streaming jobs too?
> > > 2. Why "duration is less interesting" which is mentioned in the FLIP?
> > > 3. Could you share your thoughts on "accumulated-busy-time"? It should
> > > describe the time while the task is working as expected, i.e. the happy
> > > path. When do we need it for analytics or diagnosis?
> > >
> > > BTW, you might want to optimize the format of the FLIP. Some text is
> > > running out of the right border of the wiki page.
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Thu, Jun 16, 2022 at 4:40 PM Aitozi <[email protected]> wrote:
> > >
> > > > Thanks Junhan for driving this. It a great improvement for the batch
> > jobs.
> > > > I'm looking forward to this feature in our internal use case. +1 for
> > it.
> > > >
> > > > One more question:
> > > >
> > > > Are each attempts of the TaskManager or JobManager pods (if failure
> > occurs)
> > > > all be shown in the ui ?
> > > >
> > > > Best,
> > > > Aitozi.
> > > >
> > > > Yang Wang <[email protected]> 于2022年6月16日周四 19:10写道：
> > > >
> > > > > Thanks Xintong for the explanation.
> > > > >
> > > > > It makes sense to leave the discussion about job result store in a
> > > > > dedicated thread.
> > > > >
> > > > >
> > > > > Best,
> > > > > Yang
> > > > >
> > > > > Xintong Song <[email protected]> 于2022年6月16日周四 13:40写道：
> > > > >
> > > > > > My impression of JobResultStore is more about fault tolerance and
> > high
> > > > > > availability. Using it for providing information to users sounds
> > worth
> > > > > > exploring. We probably need more time to think it through.
> > > > > >
> > > > > > Given that it doesn't conflict with what we have proposed in this
> > FLIP,
> > > > > I'd
> > > > > > suggest considering it as a separate thread and exclude it from the
> > > > scope
> > > > > > of this one.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Xintong
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Jun 16, 2022 at 11:43 AM Yang Wang <[email protected]>
> > > > > wrote:
> > > > > >
> > > > > > > This is a very useful feature both for finished streaming and
> > batch
> > > > > jobs.
> > > > > > >
> > > > > > > Except for the WebUI & REST API improvements, I am curious
> > whether we
> > > > > > could
> > > > > > > also integrate some critical information(e.g. latest checkpoint)
> > into
> > > > > the
> > > > > > > job result store[1].
> > > > > > > I am just feeling this is also somehow related with "Completed
> > Jobs
> > > > > > > Information Enhancement".
> > > > > > > And I think the history server is not necessary for all the
> > scenarios
> > > > > > > especially when users only want to check the job execution
> > result.
> > > > > > >
> > > > > > > [1].
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Yang
> > > > > > >
> > > > > > > Xintong Song <[email protected]> 于2022年6月15日周三 15:37写道：
> > > > > > >
> > > > > > > > Thanks Junhan,
> > > > > > > >
> > > > > > > > +1 for the proposed improvements.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Xintong
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Jun 15, 2022 at 3:16 PM Yangze Guo <[email protected]
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for driving this, Junhan.
> > > > > > > > >
> > > > > > > > > I think it's a valuable usability improvement for both
> > streaming
> > > > > and
> > > > > > > > > batch users. Looking forward to the community feedback.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yangze Guo
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Jun 15, 2022 at 3:10 PM junhan yang <
> > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I would like to open a discussion on FLIP-241: Completed
> > Jobs
> > > > > > > > Information
> > > > > > > > > > Enhancement.
> > > > > > > > > >
> > > > > > > > > > As far as we can tell, streaming and batch users have
> > different
> > > > > > > > interests
> > > > > > > > > > in probing a job. As Flink grows into a unified streaming &
> > > > batch
> > > > > > > > > processor
> > > > > > > > > > and is adopted by more and more batch users, the user
> > > > experience
> > > > > of
> > > > > > > > > > completed job's inspection has become more and more
> > important.
> > > > > > After
> > > > > > > > > doing
> > > > > > > > > > several market research, there are several potential
> > > > improvements
> > > > > > > > > spotted.
> > > > > > > > > >
> > > > > > > > > > The main purpose here is due to the involvement of WebUI &
> > REST
> > > > > API
> > > > > > > > > > changes, which should be openly discussed and voted on as
> > > > FLIPs.
> > > > > > > > > >
> > > > > > > > > > You can find more details in FLIP-241 document[1]. Looking
> > > > > forward
> > > > > > to
> > > > > > > > > > your feedback.
> > > > > > > > > >
> > > > > > > > > > [1] https://cwiki.apache.org/confluence/x/dRD1D
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Junhan
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >

Re: [DISCUSS] FLIP-241: Completed Jobs Information Enhancement

Reply via email to