I've updated the FLIP. All of the newly introduced REST APIs will now
apply to both the JobManager and the HistoryServer.

@Chesnay Schepler @Xintong Song Please take another look at your convenience.

Best,
Yangze Guo


On Fri, Jun 24, 2022 at 5:02 PM junhan yang <yangjunhan1...@gmail.com> wrote:
>
> Distinguish the APIs through the naming of URLs can be a way to prevent
> confusion. I think we should reconsider our API design based on the insight
> earlier and come up with a thorough explanation or perhaps a better plan
> about this.
>
> Best regards,
> Junhan
>
> Xintong Song <tonysong...@gmail.com> 于2022年6月24日周五 16:27写道:
>
> > I see. So you are suggesting the jobmanager to support both /foo/bar and
> > /jobs/:jobid/foo/bar, while the history server only supports the latter.
> >
> > I was initially thinking having two APIs in jobmanager serving the exact
> > same purpose is a bit tricky. Now I think it's a good point that these two
> > APIs, despite now returning the same results, can return different things
> > in future.
> >
> > Junhan & Yangze, WDYT?
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Fri, Jun 24, 2022 at 3:10 PM Chesnay Schepler <ches...@apache.org>
> > wrote:
> >
> > > This is pretty simple to explain.
> > >
> > > "I want to know the environment the job ran in." ->
> > > /jobs/:jobid/environment
> > > "I want to know the environment the JM ran in." ->
> > /jobmanager/environment
> > >
> > > It's less about the JobID being a parameter, and more of a way for them
> > > to better model the resource they are interested in.
> > >
> > > In the future we could consider the job environment endpoint to return
> > > not just the JM environment, but also those from the CLI/TMs.
> > >
> > > On 24/06/2022 06:37, Xintong Song wrote:
> > > > Whether the job ID is actually used in the end isn't visible after all.
> > > >
> > > > I'm not sure about this. E.g., for an empty session cluster, users have
> > > to
> > > > understand they don't need to provide an actual jobid for requesting
> > > > jobmanager information via rest.
> > > >
> > > > I believe both ways work. I think this is a trade off between a)
> > > explaining
> > > > to history server rest api users how the urls are different from
> > > jobmanager
> > > > and b) explaining to jobmanager rest api users why we need an unused
> > > jobid
> > > > for some of the cases. I'm leaning toward the current approach, because
> > > I'd
> > > > expect a smaller set of history server rest api users than (or even a
> > > > subset of) that of jobmanager.
> > > >
> > > > The plan is to document which (and how) the urls are different from
> > > > jobmanager in the history server page [1].
> > > >
> > > > Compatibility test indeed should be considered. Thanks for pointing it
> > > out.
> > > > Currently the compatibility of history server rest api is guaranteed by
> > > the
> > > > compatibility of jobmanager rest api. I think the only thing we need is
> > > to
> > > > make sure /foo/bar of jobmanager is identical to /jobs/:jobid/foo/bar
> > of
> > > > history server. We can introduce an interface, as a subtype of
> > > JsonArchivist,
> > > > that archives the json with a path that includes the jobid. Then we can
> > > > test against all relevant handlers as implementations of this
> > interface.
> > > >
> > > > WDYT?
> > > >
> > > > Best,
> > > >
> > > > Xintong
> > > >
> > > >
> > > > [1]
> > > >
> > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/advanced/historyserver/#available-requests
> > > >
> > > >
> > > >
> > > > On Thu, Jun 23, 2022 at 5:07 PM Chesnay Schepler <ches...@apache.org>
> > > wrote:
> > > >
> > > >> The addition of the /jobs/:jobid/jobmanager/config / environment
> > > >> exclusively to the HS is a bit of a strange workaround.
> > > >> How do you intend to document those? (and test compatibility)?
> > > >>
> > > >> Why not just add a general /jobs/:jobid/environment endpoint that
> > works
> > > >> just like jobmanager/environment.
> > > >> To me that seems like a cleaner solution.
> > > >> It is somewhat mentioned as an alternative in the FLIP, but I don't
> > > >> understand what is supposed to be confusing about it.
> > > >> Whether the job ID is actually used in the end isn't visible after
> > all.
> > > >>
> > > >> /jobmanager/config could be integrated into /jobs/:jobid/config.
> > > >>
> > > >> The same approach could maybe be used for logs; not really sure yet
> > (not
> > > >> a fan of displaying logs in the HS in the first place).
> > > >>
> > > >> On 23/06/2022 06:55, junhan yang wrote:
> > > >>> Hi all,
> > > >>>
> > > >>> Thank you all for your feedbacks. As far as I can see, it looks like
> > > the
> > > >>> discussion on this FLIP has been converged.
> > > >>>
> > > >>> I will start a new vote thread now.
> > > >>>
> > > >>> Best regards,
> > > >>> Junhan
> > > >>>
> > > >>> Yangze Guo <karma...@gmail.com> 于2022年6月17日周五 14:05写道:
> > > >>>
> > > >>>> Thanks for the input, Jiangang.
> > > >>>>
> > > >>>> I think it's a valid demand to distinguish completed jobs with the
> > > same
> > > >>>> name.
> > > >>>> - If they are different jobs, I think users need to give them
> > > >>>> different meaningful names respectively.
> > > >>>> - If they are exactly the same job, IIUC, what you need is to figure
> > > >>>> out the order. ApplicationId in Yarn might help. But in this case,
> > you
> > > >>>> can just sort them with the start time.
> > > >>>>
> > > >>>> Best,
> > > >>>> Yangze Guo
> > > >>>>
> > > >>>> On Fri, Jun 17, 2022 at 12:13 PM Jiangang Liu <
> > > >> liujiangangp...@gmail.com>
> > > >>>> wrote:
> > > >>>>> Thanks for the FLIP. It is helpful to track detail infos for
> > > completed
> > > >>>> jobs.
> > > >>>>> I want to ask another question. In our environment, sometimes it is
> > > >> hard
> > > >>>> to
> > > >>>>> distinguish jobs since the same job names may appear multi times in
> > > the
> > > >>>>> completed jobs. Because a job may run multi times or different jobs
> > > >> have
> > > >>>>> the same job names. I wonder that wether we can enhance the
> > complete
> > > >> jobs
> > > >>>>> display with more information, such as applicationId and
> > application
> > > >> name
> > > >>>>> in yarn. Maybe it is different in k8s to identify a job.
> > > >>>>>
> > > >>>>> Best
> > > >>>>> Jiangang Liu
> > > >>>>>
> > > >>>>> Yangze Guo <karma...@gmail.com> 于2022年6月17日周五 11:40写道:
> > > >>>>>
> > > >>>>>> Thanks for the feedback, Aitozi and Jing.
> > > >>>>>>
> > > >>>>>>> Are each attempts of the TaskManager or JobManager pods (if
> > failure
> > > >>>>>> occurs)
> > > >>>>>> all be shown in the ui?
> > > >>>>>>
> > > >>>>>> The info of the prior execution attempts will be archived, you
> > could
> > > >>>>>> refer to `ArchivedExecutionVertex$priorExecutions`.
> > > >>>>>>
> > > >>>>>>> It seems that most of these metrics are more interesting to batch
> > > >>>> jobs.
> > > >>>>>> Does it make sense to calculate them for pure streaming jobs too?
> > > >>>>>>
> > > >>>>>> All the proposed metrics will be calculated no matter what the job
> > > >>>> type is.
> > > >>>>>>> Why "duration is less interesting" which is mentioned in the
> > FLIP?
> > > >>>>>> As a first step, we mainly focus on the most interesting status
> > > during
> > > >>>>>> the job lifecycle. The duration of final states like FINISHED and
> > > >>>>>> CANCELED is meaningless, while abnormal conditions like CANCELING
> > > will
> > > >>>>>> not be included at the moment.
> > > >>>>>>
> > > >>>>>>> Could you share your thoughts on "accumulated-busy-time"? It
> > should
> > > >>>>>> describe the time while the task is working as expected, i.e. the
> > > >> happy
> > > >>>>>> path. When do we need it for analytics or diagnosis?
> > > >>>>>>
> > > >>>>>> A task could be busy or idle while it is working. Users may adjust
> > > the
> > > >>>>>> parallelism or the partition key according to the ratio between
> > > them.
> > > >>>>>>
> > > >>>>>> Best,
> > > >>>>>> Yangze Guo
> > > >>>>>>
> > > >>>>>> On Fri, Jun 17, 2022 at 5:08 AM Jing Ge <j...@ververica.com>
> > wrote:
> > > >>>>>>> Hi Junhan
> > > >>>>>>>
> > > >>>>>>> These are must-to-have information for batch processing. Thanks
> > for
> > > >>>>>>> bringing it up.
> > > >>>>>>>
> > > >>>>>>> I have some comments:
> > > >>>>>>>
> > > >>>>>>> 1. It seems that most of these metrics are more interesting to
> > > batch
> > > >>>>>> jobs.
> > > >>>>>>> Does it make sense to calculate them for pure streaming jobs too?
> > > >>>>>>> 2. Why "duration is less interesting" which is mentioned in the
> > > FLIP?
> > > >>>>>>> 3. Could you share your thoughts on "accumulated-busy-time"? It
> > > >>>> should
> > > >>>>>>> describe the time while the task is working as expected, i.e. the
> > > >>>> happy
> > > >>>>>>> path. When do we need it for analytics or diagnosis?
> > > >>>>>>>
> > > >>>>>>> BTW, you might want to optimize the format of the FLIP. Some text
> > > is
> > > >>>>>>> running out of the right border of the wiki page.
> > > >>>>>>>
> > > >>>>>>> Best regards,
> > > >>>>>>> Jing
> > > >>>>>>>
> > > >>>>>>> On Thu, Jun 16, 2022 at 4:40 PM Aitozi <gjying1...@gmail.com>
> > > wrote:
> > > >>>>>>>
> > > >>>>>>>> Thanks Junhan for driving this. It a great improvement for the
> > > >>>> batch
> > > >>>>>> jobs.
> > > >>>>>>>> I'm looking forward to this feature in our internal use case. +1
> > > >>>> for
> > > >>>>>> it.
> > > >>>>>>>> One more question:
> > > >>>>>>>>
> > > >>>>>>>> Are each attempts of the TaskManager or JobManager pods (if
> > > failure
> > > >>>>>> occurs)
> > > >>>>>>>> all be shown in the ui ?
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Aitozi.
> > > >>>>>>>>
> > > >>>>>>>> Yang Wang <danrtsey...@gmail.com> 于2022年6月16日周四 19:10写道:
> > > >>>>>>>>
> > > >>>>>>>>> Thanks Xintong for the explanation.
> > > >>>>>>>>>
> > > >>>>>>>>> It makes sense to leave the discussion about job result store
> > in
> > > >>>> a
> > > >>>>>>>>> dedicated thread.
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Best,
> > > >>>>>>>>> Yang
> > > >>>>>>>>>
> > > >>>>>>>>> Xintong Song <tonysong...@gmail.com> 于2022年6月16日周四 13:40写道:
> > > >>>>>>>>>
> > > >>>>>>>>>> My impression of JobResultStore is more about fault tolerance
> > > >>>> and
> > > >>>>>> high
> > > >>>>>>>>>> availability. Using it for providing information to users
> > > >>>> sounds
> > > >>>>>> worth
> > > >>>>>>>>>> exploring. We probably need more time to think it through.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Given that it doesn't conflict with what we have proposed in
> > > >>>> this
> > > >>>>>> FLIP,
> > > >>>>>>>>> I'd
> > > >>>>>>>>>> suggest considering it as a separate thread and exclude it
> > > >>>> from the
> > > >>>>>>>> scope
> > > >>>>>>>>>> of this one.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>>
> > > >>>>>>>>>> Xintong
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Thu, Jun 16, 2022 at 11:43 AM Yang Wang <
> > > >>>> danrtsey...@gmail.com>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>> This is a very useful feature both for finished streaming and
> > > >>>>>> batch
> > > >>>>>>>>> jobs.
> > > >>>>>>>>>>> Except for the WebUI & REST API improvements, I am curious
> > > >>>>>> whether we
> > > >>>>>>>>>> could
> > > >>>>>>>>>>> also integrate some critical information(e.g. latest
> > > >>>> checkpoint)
> > > >>>>>> into
> > > >>>>>>>>> the
> > > >>>>>>>>>>> job result store[1].
> > > >>>>>>>>>>> I am just feeling this is also somehow related with
> > > >>>> "Completed
> > > >>>>>> Jobs
> > > >>>>>>>>>>> Information Enhancement".
> > > >>>>>>>>>>> And I think the history server is not necessary for all the
> > > >>>>>> scenarios
> > > >>>>>>>>>>> especially when users only want to check the job execution
> > > >>>>>> result.
> > > >>>>>>>>>>> [1].
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore
> > > >>>>>>>>>>> Best,
> > > >>>>>>>>>>> Yang
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Xintong Song <tonysong...@gmail.com> 于2022年6月15日周三 15:37写道:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Thanks Junhan,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> +1 for the proposed improvements.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Xintong
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Wed, Jun 15, 2022 at 3:16 PM Yangze Guo <
> > > >>>> karma...@gmail.com
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>>>> Thanks for driving this, Junhan.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I think it's a valuable usability improvement for both
> > > >>>>>> streaming
> > > >>>>>>>>> and
> > > >>>>>>>>>>>>> batch users. Looking forward to the community feedback.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>> Yangze Guo
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Wed, Jun 15, 2022 at 3:10 PM junhan yang <
> > > >>>>>>>>>> yangjunhan1...@gmail.com>
> > > >>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>> Hi all,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I would like to open a discussion on FLIP-241:
> > > >>>> Completed
> > > >>>>>> Jobs
> > > >>>>>>>>>>>> Information
> > > >>>>>>>>>>>>>> Enhancement.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> As far as we can tell, streaming and batch users have
> > > >>>>>> different
> > > >>>>>>>>>>>> interests
> > > >>>>>>>>>>>>>> in probing a job. As Flink grows into a unified
> > > >>>> streaming &
> > > >>>>>>>> batch
> > > >>>>>>>>>>>>> processor
> > > >>>>>>>>>>>>>> and is adopted by more and more batch users, the user
> > > >>>>>>>> experience
> > > >>>>>>>>> of
> > > >>>>>>>>>>>>>> completed job's inspection has become more and more
> > > >>>>>> important.
> > > >>>>>>>>>> After
> > > >>>>>>>>>>>>> doing
> > > >>>>>>>>>>>>>> several market research, there are several potential
> > > >>>>>>>> improvements
> > > >>>>>>>>>>>>> spotted.
> > > >>>>>>>>>>>>>> The main purpose here is due to the involvement of
> > > >>>> WebUI &
> > > >>>>>> REST
> > > >>>>>>>>> API
> > > >>>>>>>>>>>>>> changes, which should be openly discussed and voted on
> > > >>>> as
> > > >>>>>>>> FLIPs.
> > > >>>>>>>>>>>>>> You can find more details in FLIP-241 document[1].
> > > >>>> Looking
> > > >>>>>>>>> forward
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>>> your feedback.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/dRD1D
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Best regards,
> > > >>>>>>>>>>>>>> Junhan
> > > >>
> > > >>
> > >
> > >
> >

Reply via email to