Hello all,

For those not following, I'm working on SPARK-18085, where my goal is
to decouple the storage of UI data from the actual UI implementation.
This is mostly targeted at the history server, so that it's possible
to quickly load a "database" with UI information instead of the
existing way of re-parsing event logs, but I think it also helps with
the live UI, since it doesn't require storing UI information in memory
and thus relieves some memory pressure on the driver. (I may still add
an in-memory database in that project, but that's digressing from the
topic at hand.)

One of my (unwritten) goals in that project was to get rid of
JobProgressListener. Now that I'm at a point where I can do that from
the UI's p.o.v., I ran into SparkStatusTracker. So I'd like to get
people's views on two topics.

(i) deprecate SparkStatusTracker, provide a new API based on the
public REST types.

SparkStatusTracker provides yet another way of getting job, stage and
executor information (aside from the UI and the API). It has its own
types that model those, which are based on the existing UI types but
not the same. It could be replaced by making REST calls to the UI
endpoint, but that's sub-optimal since it doesn't make a lot of sense
to do that when you already have an instance of SparkContext to play
with.

Since that's a public, stable API, it can't be removed right away. But
I'd like to propose that we deprecate it, and provide a new API that
is based on the REST types (which, with my work, are also used in the
UI). The existing "SparkStatusTracker" would still exist until we can
remove it, of course.

What do people think about this approach? Another option is to not add
the new API, but keep SparkStatusTracker around using the new UI
database to back it.

(ii) Remove JobProgressListener

I didn't notice it before, but JobProgressListener is public-ish
(@DeveloperApi). I'm not sure why that is, and it's a weird thing
because it exposes non-public types (from UIData.scala) in its API.
With the work I'm doing, and the above suggestion about
SparkStatusTracker, JobProgressListener becomes unused in Spark
itself, and keeping it would just mean the driver keeps using unneeded
memory.

Are there concerns about removing that class? Its functionality is
available in both SparkStatusTracker and the REST API, so it's mostly
redundant.


So, thoughts?


Note to self: (i) above means I'd have to scale back some of my goals
for SPARK-18085. More specifically, the code that creates the UI
database will always need to run (just like JobProgressListener always
exists now), so that SparkStatusTracker still works. Which also means
moving some of the code I was hoping to keep in a separate module into
core/.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to