Thanks for the updates. LGTM Best,
Xintong On Mon, Jun 27, 2022 at 2:48 PM Yangze Guo <karma...@gmail.com> wrote: > I've updated the FLIP. All of the newly introduced REST APIs will now > apply to both the JobManager and the HistoryServer. > > @Chesnay Schepler @Xintong Song Please take another look at your > convenience. > > Best, > Yangze Guo > > > On Fri, Jun 24, 2022 at 5:02 PM junhan yang <yangjunhan1...@gmail.com> > wrote: > > > > Distinguish the APIs through the naming of URLs can be a way to prevent > > confusion. I think we should reconsider our API design based on the > insight > > earlier and come up with a thorough explanation or perhaps a better plan > > about this. > > > > Best regards, > > Junhan > > > > Xintong Song <tonysong...@gmail.com> 于2022年6月24日周五 16:27写道: > > > > > I see. So you are suggesting the jobmanager to support both /foo/bar > and > > > /jobs/:jobid/foo/bar, while the history server only supports the > latter. > > > > > > I was initially thinking having two APIs in jobmanager serving the > exact > > > same purpose is a bit tricky. Now I think it's a good point that these > two > > > APIs, despite now returning the same results, can return different > things > > > in future. > > > > > > Junhan & Yangze, WDYT? > > > > > > Best, > > > > > > Xintong > > > > > > > > > > > > On Fri, Jun 24, 2022 at 3:10 PM Chesnay Schepler <ches...@apache.org> > > > wrote: > > > > > > > This is pretty simple to explain. > > > > > > > > "I want to know the environment the job ran in." -> > > > > /jobs/:jobid/environment > > > > "I want to know the environment the JM ran in." -> > > > /jobmanager/environment > > > > > > > > It's less about the JobID being a parameter, and more of a way for > them > > > > to better model the resource they are interested in. > > > > > > > > In the future we could consider the job environment endpoint to > return > > > > not just the JM environment, but also those from the CLI/TMs. > > > > > > > > On 24/06/2022 06:37, Xintong Song wrote: > > > > > Whether the job ID is actually used in the end isn't visible after > all. > > > > > > > > > > I'm not sure about this. E.g., for an empty session cluster, users > have > > > > to > > > > > understand they don't need to provide an actual jobid for > requesting > > > > > jobmanager information via rest. > > > > > > > > > > I believe both ways work. I think this is a trade off between a) > > > > explaining > > > > > to history server rest api users how the urls are different from > > > > jobmanager > > > > > and b) explaining to jobmanager rest api users why we need an > unused > > > > jobid > > > > > for some of the cases. I'm leaning toward the current approach, > because > > > > I'd > > > > > expect a smaller set of history server rest api users than (or > even a > > > > > subset of) that of jobmanager. > > > > > > > > > > The plan is to document which (and how) the urls are different from > > > > > jobmanager in the history server page [1]. > > > > > > > > > > Compatibility test indeed should be considered. Thanks for > pointing it > > > > out. > > > > > Currently the compatibility of history server rest api is > guaranteed by > > > > the > > > > > compatibility of jobmanager rest api. I think the only thing we > need is > > > > to > > > > > make sure /foo/bar of jobmanager is identical to > /jobs/:jobid/foo/bar > > > of > > > > > history server. We can introduce an interface, as a subtype of > > > > JsonArchivist, > > > > > that archives the json with a path that includes the jobid. Then > we can > > > > > test against all relevant handlers as implementations of this > > > interface. > > > > > > > > > > WDYT? > > > > > > > > > > Best, > > > > > > > > > > Xintong > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/advanced/historyserver/#available-requests > > > > > > > > > > > > > > > > > > > > On Thu, Jun 23, 2022 at 5:07 PM Chesnay Schepler < > ches...@apache.org> > > > > wrote: > > > > > > > > > >> The addition of the /jobs/:jobid/jobmanager/config / environment > > > > >> exclusively to the HS is a bit of a strange workaround. > > > > >> How do you intend to document those? (and test compatibility)? > > > > >> > > > > >> Why not just add a general /jobs/:jobid/environment endpoint that > > > works > > > > >> just like jobmanager/environment. > > > > >> To me that seems like a cleaner solution. > > > > >> It is somewhat mentioned as an alternative in the FLIP, but I > don't > > > > >> understand what is supposed to be confusing about it. > > > > >> Whether the job ID is actually used in the end isn't visible after > > > all. > > > > >> > > > > >> /jobmanager/config could be integrated into /jobs/:jobid/config. > > > > >> > > > > >> The same approach could maybe be used for logs; not really sure > yet > > > (not > > > > >> a fan of displaying logs in the HS in the first place). > > > > >> > > > > >> On 23/06/2022 06:55, junhan yang wrote: > > > > >>> Hi all, > > > > >>> > > > > >>> Thank you all for your feedbacks. As far as I can see, it looks > like > > > > the > > > > >>> discussion on this FLIP has been converged. > > > > >>> > > > > >>> I will start a new vote thread now. > > > > >>> > > > > >>> Best regards, > > > > >>> Junhan > > > > >>> > > > > >>> Yangze Guo <karma...@gmail.com> 于2022年6月17日周五 14:05写道: > > > > >>> > > > > >>>> Thanks for the input, Jiangang. > > > > >>>> > > > > >>>> I think it's a valid demand to distinguish completed jobs with > the > > > > same > > > > >>>> name. > > > > >>>> - If they are different jobs, I think users need to give them > > > > >>>> different meaningful names respectively. > > > > >>>> - If they are exactly the same job, IIUC, what you need is to > figure > > > > >>>> out the order. ApplicationId in Yarn might help. But in this > case, > > > you > > > > >>>> can just sort them with the start time. > > > > >>>> > > > > >>>> Best, > > > > >>>> Yangze Guo > > > > >>>> > > > > >>>> On Fri, Jun 17, 2022 at 12:13 PM Jiangang Liu < > > > > >> liujiangangp...@gmail.com> > > > > >>>> wrote: > > > > >>>>> Thanks for the FLIP. It is helpful to track detail infos for > > > > completed > > > > >>>> jobs. > > > > >>>>> I want to ask another question. In our environment, sometimes > it is > > > > >> hard > > > > >>>> to > > > > >>>>> distinguish jobs since the same job names may appear multi > times in > > > > the > > > > >>>>> completed jobs. Because a job may run multi times or different > jobs > > > > >> have > > > > >>>>> the same job names. I wonder that wether we can enhance the > > > complete > > > > >> jobs > > > > >>>>> display with more information, such as applicationId and > > > application > > > > >> name > > > > >>>>> in yarn. Maybe it is different in k8s to identify a job. > > > > >>>>> > > > > >>>>> Best > > > > >>>>> Jiangang Liu > > > > >>>>> > > > > >>>>> Yangze Guo <karma...@gmail.com> 于2022年6月17日周五 11:40写道: > > > > >>>>> > > > > >>>>>> Thanks for the feedback, Aitozi and Jing. > > > > >>>>>> > > > > >>>>>>> Are each attempts of the TaskManager or JobManager pods (if > > > failure > > > > >>>>>> occurs) > > > > >>>>>> all be shown in the ui? > > > > >>>>>> > > > > >>>>>> The info of the prior execution attempts will be archived, you > > > could > > > > >>>>>> refer to `ArchivedExecutionVertex$priorExecutions`. > > > > >>>>>> > > > > >>>>>>> It seems that most of these metrics are more interesting to > batch > > > > >>>> jobs. > > > > >>>>>> Does it make sense to calculate them for pure streaming jobs > too? > > > > >>>>>> > > > > >>>>>> All the proposed metrics will be calculated no matter what > the job > > > > >>>> type is. > > > > >>>>>>> Why "duration is less interesting" which is mentioned in the > > > FLIP? > > > > >>>>>> As a first step, we mainly focus on the most interesting > status > > > > during > > > > >>>>>> the job lifecycle. The duration of final states like FINISHED > and > > > > >>>>>> CANCELED is meaningless, while abnormal conditions like > CANCELING > > > > will > > > > >>>>>> not be included at the moment. > > > > >>>>>> > > > > >>>>>>> Could you share your thoughts on "accumulated-busy-time"? It > > > should > > > > >>>>>> describe the time while the task is working as expected, i.e. > the > > > > >> happy > > > > >>>>>> path. When do we need it for analytics or diagnosis? > > > > >>>>>> > > > > >>>>>> A task could be busy or idle while it is working. Users may > adjust > > > > the > > > > >>>>>> parallelism or the partition key according to the ratio > between > > > > them. > > > > >>>>>> > > > > >>>>>> Best, > > > > >>>>>> Yangze Guo > > > > >>>>>> > > > > >>>>>> On Fri, Jun 17, 2022 at 5:08 AM Jing Ge <j...@ververica.com> > > > wrote: > > > > >>>>>>> Hi Junhan > > > > >>>>>>> > > > > >>>>>>> These are must-to-have information for batch processing. > Thanks > > > for > > > > >>>>>>> bringing it up. > > > > >>>>>>> > > > > >>>>>>> I have some comments: > > > > >>>>>>> > > > > >>>>>>> 1. It seems that most of these metrics are more interesting > to > > > > batch > > > > >>>>>> jobs. > > > > >>>>>>> Does it make sense to calculate them for pure streaming jobs > too? > > > > >>>>>>> 2. Why "duration is less interesting" which is mentioned in > the > > > > FLIP? > > > > >>>>>>> 3. Could you share your thoughts on "accumulated-busy-time"? > It > > > > >>>> should > > > > >>>>>>> describe the time while the task is working as expected, > i.e. the > > > > >>>> happy > > > > >>>>>>> path. When do we need it for analytics or diagnosis? > > > > >>>>>>> > > > > >>>>>>> BTW, you might want to optimize the format of the FLIP. Some > text > > > > is > > > > >>>>>>> running out of the right border of the wiki page. > > > > >>>>>>> > > > > >>>>>>> Best regards, > > > > >>>>>>> Jing > > > > >>>>>>> > > > > >>>>>>> On Thu, Jun 16, 2022 at 4:40 PM Aitozi <gjying1...@gmail.com > > > > > > wrote: > > > > >>>>>>> > > > > >>>>>>>> Thanks Junhan for driving this. It a great improvement for > the > > > > >>>> batch > > > > >>>>>> jobs. > > > > >>>>>>>> I'm looking forward to this feature in our internal use > case. +1 > > > > >>>> for > > > > >>>>>> it. > > > > >>>>>>>> One more question: > > > > >>>>>>>> > > > > >>>>>>>> Are each attempts of the TaskManager or JobManager pods (if > > > > failure > > > > >>>>>> occurs) > > > > >>>>>>>> all be shown in the ui ? > > > > >>>>>>>> > > > > >>>>>>>> Best, > > > > >>>>>>>> Aitozi. > > > > >>>>>>>> > > > > >>>>>>>> Yang Wang <danrtsey...@gmail.com> 于2022年6月16日周四 19:10写道: > > > > >>>>>>>> > > > > >>>>>>>>> Thanks Xintong for the explanation. > > > > >>>>>>>>> > > > > >>>>>>>>> It makes sense to leave the discussion about job result > store > > > in > > > > >>>> a > > > > >>>>>>>>> dedicated thread. > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> Best, > > > > >>>>>>>>> Yang > > > > >>>>>>>>> > > > > >>>>>>>>> Xintong Song <tonysong...@gmail.com> 于2022年6月16日周四 > 13:40写道: > > > > >>>>>>>>> > > > > >>>>>>>>>> My impression of JobResultStore is more about fault > tolerance > > > > >>>> and > > > > >>>>>> high > > > > >>>>>>>>>> availability. Using it for providing information to users > > > > >>>> sounds > > > > >>>>>> worth > > > > >>>>>>>>>> exploring. We probably need more time to think it through. > > > > >>>>>>>>>> > > > > >>>>>>>>>> Given that it doesn't conflict with what we have proposed > in > > > > >>>> this > > > > >>>>>> FLIP, > > > > >>>>>>>>> I'd > > > > >>>>>>>>>> suggest considering it as a separate thread and exclude it > > > > >>>> from the > > > > >>>>>>>> scope > > > > >>>>>>>>>> of this one. > > > > >>>>>>>>>> > > > > >>>>>>>>>> Best, > > > > >>>>>>>>>> > > > > >>>>>>>>>> Xintong > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>>> On Thu, Jun 16, 2022 at 11:43 AM Yang Wang < > > > > >>>> danrtsey...@gmail.com> > > > > >>>>>>>>> wrote: > > > > >>>>>>>>>>> This is a very useful feature both for finished > streaming and > > > > >>>>>> batch > > > > >>>>>>>>> jobs. > > > > >>>>>>>>>>> Except for the WebUI & REST API improvements, I am > curious > > > > >>>>>> whether we > > > > >>>>>>>>>> could > > > > >>>>>>>>>>> also integrate some critical information(e.g. latest > > > > >>>> checkpoint) > > > > >>>>>> into > > > > >>>>>>>>> the > > > > >>>>>>>>>>> job result store[1]. > > > > >>>>>>>>>>> I am just feeling this is also somehow related with > > > > >>>> "Completed > > > > >>>>>> Jobs > > > > >>>>>>>>>>> Information Enhancement". > > > > >>>>>>>>>>> And I think the history server is not necessary for all > the > > > > >>>>>> scenarios > > > > >>>>>>>>>>> especially when users only want to check the job > execution > > > > >>>>>> result. > > > > >>>>>>>>>>> [1]. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >> > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore > > > > >>>>>>>>>>> Best, > > > > >>>>>>>>>>> Yang > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Xintong Song <tonysong...@gmail.com> 于2022年6月15日周三 > 15:37写道: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> Thanks Junhan, > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> +1 for the proposed improvements. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Best, > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Xintong > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> On Wed, Jun 15, 2022 at 3:16 PM Yangze Guo < > > > > >>>> karma...@gmail.com > > > > >>>>>>>>> wrote: > > > > >>>>>>>>>>>>> Thanks for driving this, Junhan. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> I think it's a valuable usability improvement for both > > > > >>>>>> streaming > > > > >>>>>>>>> and > > > > >>>>>>>>>>>>> batch users. Looking forward to the community feedback. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Best, > > > > >>>>>>>>>>>>> Yangze Guo > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> On Wed, Jun 15, 2022 at 3:10 PM junhan yang < > > > > >>>>>>>>>> yangjunhan1...@gmail.com> > > > > >>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>> Hi all, > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> I would like to open a discussion on FLIP-241: > > > > >>>> Completed > > > > >>>>>> Jobs > > > > >>>>>>>>>>>> Information > > > > >>>>>>>>>>>>>> Enhancement. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> As far as we can tell, streaming and batch users have > > > > >>>>>> different > > > > >>>>>>>>>>>> interests > > > > >>>>>>>>>>>>>> in probing a job. As Flink grows into a unified > > > > >>>> streaming & > > > > >>>>>>>> batch > > > > >>>>>>>>>>>>> processor > > > > >>>>>>>>>>>>>> and is adopted by more and more batch users, the user > > > > >>>>>>>> experience > > > > >>>>>>>>> of > > > > >>>>>>>>>>>>>> completed job's inspection has become more and more > > > > >>>>>> important. > > > > >>>>>>>>>> After > > > > >>>>>>>>>>>>> doing > > > > >>>>>>>>>>>>>> several market research, there are several potential > > > > >>>>>>>> improvements > > > > >>>>>>>>>>>>> spotted. > > > > >>>>>>>>>>>>>> The main purpose here is due to the involvement of > > > > >>>> WebUI & > > > > >>>>>> REST > > > > >>>>>>>>> API > > > > >>>>>>>>>>>>>> changes, which should be openly discussed and voted on > > > > >>>> as > > > > >>>>>>>> FLIPs. > > > > >>>>>>>>>>>>>> You can find more details in FLIP-241 document[1]. > > > > >>>> Looking > > > > >>>>>>>>> forward > > > > >>>>>>>>>> to > > > > >>>>>>>>>>>>>> your feedback. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/dRD1D > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Best regards, > > > > >>>>>>>>>>>>>> Junhan > > > > >> > > > > >> > > > > > > > > > > > >