Hi All!

With the growing number of Flink streaming applications the current HS
implementation is starting to lose its value. Users running streaming
applications mostly care about what is running right now on the cluster and
a centralised view on history is not very useful.

We have been experimenting with reworking the current HS into a Global
Flink Dashboard that would show all running and completed/failed jobs on
all the running Flink clusters the users have.

In essence we would get a view similar to the current HS but it would also
show the running jobs with a link redirecting to the actual cluster
specific dashboard.

This is how it looks now:


In this version we took a very simple approach of introducing a cluster
discovery abstraction to collect all the running Flink clusters (by listing
yarn apps for instance).

The main pages aggregating jobs from different clusters would then simply
make calls to all clusters and aggregate the response. Job specific
endpoints would be simply routed to the correct target cluster. This way
the changes required are localised to the current HS implementation and
cluster rest endpoints don't need to be changed.

In addition to getting a fully working global dashboard this also gets us a
fully functioning rest endpoint for accessing all jobs in all clusters
without having to provide the clusterId (yarn app id for instance) that we
can use to enhance CLI experience in multi cluster (lot of per-job
clusters) environments. Please let us know what you think! Gyula

Reply via email to