I suggest using the current behavior as the default and add a flag to implement the behavior you're suggesting: to link to the logs path in YARN instead of directly to stderr and stdout.
On Fri, Feb 8, 2019 at 3:33 PM Jungtaek Lim <kabh...@gmail.com> wrote: > Ryan, > > actually I'm not clear about your suggestion. For me three possible > options here: > > 1. If we want to let users be able to completely rewrite log urls, that's > SPARK-26792 <https://issues.apache.org/jira/browse/SPARK-26792>. For SHS > we already addressed it. > 2. We could let users turning on/off flag option to just get one url or > default two stdout/stderr urls. > 3. We could let users enumerate file names they want to link, and create > log links for each file. > > Which one do you suggest? > > 2019년 2월 9일 (토) 오전 8:24, Ryan Blue <rb...@netflix.com>님이 작성: > >> Jungtaek, >> >> Thanks for the extra context. Those quotes are the confirmation that I >> was looking for to expose the link you suggest instead of going directly to >> stderr and stdout. >> >> What do you think about my suggestion to change this with a config >> option? I would prefer that since we use the supported pattern. But I would >> support moving forward on this either way. >> >> rb >> >> On Fri, Feb 8, 2019 at 3:03 PM Sean Owen <sro...@gmail.com> wrote: >> >>> I think that's a reasonable argument, that it provides links to >>> potentially several logs of interest. It reduces the UI clutter a >>> little at the cost of one more hop to get to logs. >>> I don't feel strongly about it but think that's a reasonable thing to do. >>> >>> On Fri, Feb 8, 2019 at 4:57 PM Jungtaek Lim <kabh...@gmail.com> wrote: >>> > >>> > Let me quote some voices here: seems like they don't participate this >>> thread. This still doesn't represent the majority are using this pattern, >>> so I'm also OK to make it optional (I might just work on SPARK-26792 to >>> address) and leave the default as it is if others aren't interested on this. >>> > >>> > https://github.com/apache/spark/pull/23260#issuecomment-456827963 >>> > >>> > Sorry I haven't had time to look through all the code so this might be >>> a separate jira, but one thing I thought of here is it would be really nice >>> not to have specifically stderr/stdout. users can specify any >>> log4j.properties and some tools like oozie by default end up using hadoop >>> log4j rather then spark log4j, so files aren't necessarily the same. Also >>> users can put in other logs files so it would be nice to have links to >>> those from the UI. It seems simpler if we just had a link to the directory >>> and it read the files within there. Other things in Hadoop do it this way, >>> but I'm not sure if that works well for other resource managers, any >>> thoughts on that? As long as this doesn't prevent the above I can file a >>> separate jira for it. >>> > >>> > https://github.com/apache/spark/pull/23260#issuecomment-456904716 >>> > >>> > Hi Tom, +1: singling out stdout and stderr is definitely an annoyance. >>> We >>> > typically configure Spark jobs to write the GC log and dump heap on OOM >>> > using <LOG_DIR>, and/or we use the rolling file appender to deal with >>> > large logs during debugging. So linking the YARN container log overview >>> > page would make much more sense for us. We work it around with a custom >>> > submit process that logs all important URLs on the submit side log. >>> > >>> > >>> > >>> > 2019년 2월 9일 (토) 오전 5:42, Ryan Blue <rb...@netflix.com>님이 작성: >>> >> >>> >> Here's what I see from a running job on our cluster. Both of these >>> are links that go to the stderr and stdout links that Spark produces today. >>> >> >>> >> stderr : Total file length is 18557 bytes. >>> >> stdout : Total file length is 0 bytes. >>> >> >>> >> While it is nice to see that stderr or stdout has content, I don't >>> think that this is worth the extra click or changes to Spark. >>> >> >>> >> However, we have configured our logs to go to stderr and stdout so >>> these links work for us. I think some YARN applications send logs to a >>> separate log endpoint, which would be useful when listed here. Does anyone >>> have logs going to locations other than stderr and stdout? >>> >> >>> >> If there are logs going to other files, then I think making this an >>> option is reasonable. Otherwise, I think we should leave links as they are. >>> >> >>> >> rb >>> >> >>> >> On Thu, Feb 7, 2019 at 12:31 PM Jungtaek Lim <kabh...@gmail.com> >>> wrote: >>> >>> >>> >>> New URL shows all of local logs which includes stdout and stderr as >>> a list. >>> >>> >>> >>> The change would help when end users modify their log4j >>> configuration to have another log files, as well as GC logs. Currently >>> Spark only shows two static files (stdout, stderr) as individual links so >>> easier to see the content (one-click) but users have to remove file part >>> manually from URL to access list page. Instead of this we may be able to >>> change default URL to show all of local logs and let users choose which >>> file to read. (though it would be two-clicks to access to actual file) >>> >>> >>> >>> -Jungtaek Lim (HeartSaVioR) >>> >>> >>> >>> 2019년 2월 8일 (금) 오전 1:33, Ryan Blue <rb...@netflix.com>님이 작성: >>> >>>> >>> >>>> Jungtaek, >>> >>>> >>> >>>> What is shown at the new URL and how would this improve usability? >>> >>>> >>> >>>> On Thu, Feb 7, 2019 at 12:45 AM Jungtaek Lim <kabh...@gmail.com> >>> wrote: >>> >>>>> >>> >>>>> Hi devs, >>> >>>>> >>> >>>>> Based on the suggestion Tom Graves gave me in SPARK-26792, I'd >>> like to hear voices on changing default executor log URLs for YARN, >>> specifically removing "stdout" and "stderr" and provide link which shows >>> log file"s". For example, instead of referring two links below: >>> >>>>> >>> >>>>> http:// >>> <NM_HOST>:<NM_PORT>/node/containerlogs/<CONTAINER_ID>/<USER>/<stdout|stderr>?start=-4096 >>> >>>>> >>> >>>>> we just refer only one link below: >>> >>>>> >>> >>>>> http:// >>> <NM_HOST>:<NM_PORT>/node/containerlogs/<CONTAINER_ID>/<USER> >>> >>>>> >>> >>>>> I've checked new URL works with redirection on NM to jobhistory, >>> so it won't break what we currently supported. Going through the actual log >>> file would require two clicks instead of one click though. >>> >>>>> >>> >>>>> Given it introduces the change on UX I'd like to hear voices on >>> this before submitting a patch. If we'd rather keep this as it is, I would >>> just open the chance to apply custom log URL for Spark UI as well. >>> >>>>> >>> >>>>> Thanks in advance! >>> >>>>> >>> >>>>> FYI, below is the rationalization on discussion: >>> >>>>> >>> >>>>> While I worked regarding SPARK-23155, I've got some inputs around >>> linking "log directory" instead of log urls for each "stdout" and "stderr", >>> because in real case end users would put more files then only stdout and >>> stderr (like gc logs). >>> >>>>> >>> >>>>> SPARK-23155 provides the way to modify log URL but it's only >>> applied to SHS, and in Spark UI in running apps it still only shows >>> "stdout" and "stderr". SPARK-26792 is for applying this to Spark UI as >>> well, but I've got suggestion to just change the default log URL. >>> >>>>> >>> >>>>> Thanks again, >>> >>>>> Jungtaek Lim (HeartSaVioR) >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Ryan Blue >>> >>>> Software Engineer >>> >>>> Netflix >>> >> >>> >> >>> >> >>> >> -- >>> >> Ryan Blue >>> >> Software Engineer >>> >> Netflix >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix