Note that it looks like we are planning on adding support for application 
specific frameworks to YARN sooner rather then later. There is an initial 
design up here: https://issues.apache.org/jira/browse/YARN-1530. Note this has 
not been reviewed yet so changes are likely but gives an idea of the general 
direction.  If anyone has comments on how that might work with SPARK I 
encourage you to post to the jira.

As Sandy mentioned it would be very nice if the solution could be compatible 
with that.  

Tom



On Wednesday, January 8, 2014 12:44 AM, Sandy Ryza <sandy.r...@cloudera.com> 
wrote:
 
Hey,

YARN-321 is targeted for the Hadoop 2.4.  The minimum feature set doesn't
include application-specific data, so that probably won't be part of 2.4
unless other things delay the release for a while.  There are no APIs for
it yet and pluggable UIs have been discussed but not agreed upon.  I think
requirements from Spark could be useful in helping shape what gets done
there.

-Sandy



On Tue, Jan 7, 2014 at 4:13 PM, Patrick Wendell <pwend...@gmail.com> wrote:

> Hey Sandy,
>
> Do you know what the status is for YARN-321 and what version of YARN
> it's targeted for? Also, is there any kind of documentation or API for
> this? Does it control the presentation of the data itself (e.g. it
> actually has its own UI)?
>
> @Tom - having an optional history server sounds like a good idea.
>
> One question is what format to use for storing the data and how the
> persisted format relates to XML/HTML generation in the live UI. One
> idea would be to add JSON as an intermediate format inside of the
> current WebUI, and then any JSON page could be persisted and rendered
> by the history server using the same code. Once a SparkContext exits
> it could dump a series of named paths each with a JSON file. Then the
> history server could load those paths and pass them through the second
> rendering stage (JSON => XML) to create each page.
>
> It would be good if SPARK-969 had a good design doc before anyone
> starts working on it.
>
> - Patrick
>
> On Tue, Jan 7, 2014 at 3:18 PM, Sandy Ryza <sandy.r...@cloudera.com>
> wrote:
> > As a sidenote, it would be nice to make sure that whatever done here will
> > work with the YARN Application History Server (YARN-321), a generic
> history
> > server that functions similarly to MapReduce's JobHistoryServer.  It will
> > eventually have the ability to store application-specific data.
> >
> > -Sandy
> >
> >
> > On Tue, Jan 7, 2014 at 2:51 PM, Tom Graves <tgraves...@yahoo.com> wrote:
> >
> >> I don't think you want to save the html/xml files. I would rather see
> the
> >> info saved into a history file in like a json format that could then be
> >> re-read and the web ui display the info, hopefully without much change
> to
> >> the UI parts.  For instance perhaps the history server could read the
> file
> >> and populate the appropriate Spark data structures that the web ui
> already
> >> uses.
> >>
> >> I would suggest making it so the history server is an optional server
> and
> >> could be run on any node. That way if the load on a particular node
> becomes
> >> to much it could be moved, but you also could run it on the same node as
> >> the Master.  All it really needs to know is where to get the history
> files
> >> from and have access to that location.
> >>
> >> Hadoop actually has a history server for MapReduce which works very
> >> similar to what I mention above.   One thing to keep in minds here is
> >> security.  You want to make sure that the history files can only be
> read by
> >> users who have the appropriate permissions.  The history server itself
> >> could run as  a superuser who has permission to server up the files
> based
> >> on the acls.
> >>
> >>
> >>
> >> On Tuesday, January 7, 2014 8:06 AM, "Xia, Junluan" <
> junluan....@intel.com>
> >> wrote:
> >>
> >> Hi all
> >>          Spark job web ui will not be available when job is over, but it
> >> is convenient for developer to debug with persisting job web ui. I just
> >> come up with draft for this issue.
> >>
> >> 1.       We could simply save the web page with html/xml
> >> format(stages/executors/storages/environment) to certain location when
> job
> >> finished
> >>
> >> 2.       But it is not easy for user to review the job info with #1, we
> >> could build extra job history service for developers
> >>
> >> 3.       But where will we build this history service? In Driver node or
> >> Master node?
> >>
> >> Any suggestions about this improvement?
> >>
> >> regards,
> >> Andrew
> >>
>

Reply via email to