Note that it looks like we are planning on adding support for application specific frameworks to YARN sooner rather then later. There is an initial design up here: https://issues.apache.org/jira/browse/YARN-1530. Note this has not been reviewed yet so changes are likely but gives an idea of the general direction. If anyone has comments on how that might work with SPARK I encourage you to post to the jira.
As Sandy mentioned it would be very nice if the solution could be compatible with that. Tom On Wednesday, January 8, 2014 12:44 AM, Sandy Ryza <sandy.r...@cloudera.com> wrote: Hey, YARN-321 is targeted for the Hadoop 2.4. The minimum feature set doesn't include application-specific data, so that probably won't be part of 2.4 unless other things delay the release for a while. There are no APIs for it yet and pluggable UIs have been discussed but not agreed upon. I think requirements from Spark could be useful in helping shape what gets done there. -Sandy On Tue, Jan 7, 2014 at 4:13 PM, Patrick Wendell <pwend...@gmail.com> wrote: > Hey Sandy, > > Do you know what the status is for YARN-321 and what version of YARN > it's targeted for? Also, is there any kind of documentation or API for > this? Does it control the presentation of the data itself (e.g. it > actually has its own UI)? > > @Tom - having an optional history server sounds like a good idea. > > One question is what format to use for storing the data and how the > persisted format relates to XML/HTML generation in the live UI. One > idea would be to add JSON as an intermediate format inside of the > current WebUI, and then any JSON page could be persisted and rendered > by the history server using the same code. Once a SparkContext exits > it could dump a series of named paths each with a JSON file. Then the > history server could load those paths and pass them through the second > rendering stage (JSON => XML) to create each page. > > It would be good if SPARK-969 had a good design doc before anyone > starts working on it. > > - Patrick > > On Tue, Jan 7, 2014 at 3:18 PM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > > As a sidenote, it would be nice to make sure that whatever done here will > > work with the YARN Application History Server (YARN-321), a generic > history > > server that functions similarly to MapReduce's JobHistoryServer. It will > > eventually have the ability to store application-specific data. > > > > -Sandy > > > > > > On Tue, Jan 7, 2014 at 2:51 PM, Tom Graves <tgraves...@yahoo.com> wrote: > > > >> I don't think you want to save the html/xml files. I would rather see > the > >> info saved into a history file in like a json format that could then be > >> re-read and the web ui display the info, hopefully without much change > to > >> the UI parts. For instance perhaps the history server could read the > file > >> and populate the appropriate Spark data structures that the web ui > already > >> uses. > >> > >> I would suggest making it so the history server is an optional server > and > >> could be run on any node. That way if the load on a particular node > becomes > >> to much it could be moved, but you also could run it on the same node as > >> the Master. All it really needs to know is where to get the history > files > >> from and have access to that location. > >> > >> Hadoop actually has a history server for MapReduce which works very > >> similar to what I mention above. One thing to keep in minds here is > >> security. You want to make sure that the history files can only be > read by > >> users who have the appropriate permissions. The history server itself > >> could run as a superuser who has permission to server up the files > based > >> on the acls. > >> > >> > >> > >> On Tuesday, January 7, 2014 8:06 AM, "Xia, Junluan" < > junluan....@intel.com> > >> wrote: > >> > >> Hi all > >> Spark job web ui will not be available when job is over, but it > >> is convenient for developer to debug with persisting job web ui. I just > >> come up with draft for this issue. > >> > >> 1. We could simply save the web page with html/xml > >> format(stages/executors/storages/environment) to certain location when > job > >> finished > >> > >> 2. But it is not easy for user to review the job info with #1, we > >> could build extra job history service for developers > >> > >> 3. But where will we build this history service? In Driver node or > >> Master node? > >> > >> Any suggestions about this improvement? > >> > >> regards, > >> Andrew > >> >