[
https://issues.apache.org/jira/browse/FLINK-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892693#comment-15892693
]
ASF GitHub Bot commented on FLINK-1579:
---------------------------------------
GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/3460
[FLINK-1579] Implement History Server
This PR adds a slightly unpolished HistoryServer implementation. It is
missing tests and some documentation, but is working.
This PR builds on top of #3377.
The basic idea is as follows:
The ```MemoryArchivist```, upon receiving an ```ExecutionGraph```, writes a
set of json files into a directory structure resembling the REST API using the
features introduced in FLINK-5870, FLINK-5852 and FLINK-5941. The target
location is configurable using ```job-manager.archive.dir```. Each job resides
in it's own directory, using the job ID as the directory name. As such, each
archive is consistent on it's own and multiple jobmanagers may use the same
archive dir.
The ```HistoryServer``` polls certain directories, configured via
```historyserver.archive.dirs```, in regular intervals, configured via
```historyserver.refresh-interval```, for new job archives. If a new archive is
found it is downloaded and integrated into a cache of job archives in the local
file system, configurable using ```historyserver.web.dir```. These files are
served to a slightly modified WebFrontend using the
```HistoryServerStaticFileServerHandler```.
In the end the HistoryServer is little more than an aggregator and archive
viewer.
None of the directory configuration options have defaults; as it stands the
entire feature is opt-in.
Should a file that the WebFrontend requests be missing a separate fetch
routine kicks in which attempts to fetch the missing file. This is primarily
aimed at eventually-consistent file-systems.
The HistoryServer is started using the new historyserver.sh script, which
works similarly to job- or taskmanager scripts: ```./bin/historyserver.sh
[start|stop]```
2 bigger refactorings were made to existing code to increase the amount of
shared code:
* the netty setup in the WebRuntimeMonitor was moved into a separate
NettySetup class which the HistoryServer can use as well
* an AbstractStaticFileServerHandler was added which the
(HistoryServer)StaticFileServerHandler extend
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink 1579_history_server_pr
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3460.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3460
----
commit 61a07456f151ac8f5418ac66629751e1a83ada3a
Author: zentol <[email protected]>
Date: 2017-01-24T09:13:24Z
[FLINK-1579] Implement History Server - Frontend
commit e6316e544fea160f7d050dd1b087301a83345d31
Author: zentol <[email protected]>
Date: 2017-02-21T11:36:17Z
[FLINK-5645] Store accumulators/metrics for canceled/failed tasks
commit 84fd2746b09ce41c2d9bd5be7f6e8a8cc1a3291d
Author: zentol <[email protected]>
Date: 2017-03-02T12:31:56Z
Refactor netty setup into separate class
commit 81d7e6b92fe69326d6edf6b63f3f9c95f5ebd0ef
Author: zentol <[email protected]>
Date: 2017-02-22T14:47:07Z
[FLINK-1579] Implement History Server - Backend
commit 8d1e8c59690ea97be4bbaf1a011c8ec4a68f5892
Author: zentol <[email protected]>
Date: 2017-03-02T11:09:36Z
Rebuild frontend
----
> Create a Flink History Server
> -----------------------------
>
> Key: FLINK-1579
> URL: https://issues.apache.org/jira/browse/FLINK-1579
> Project: Flink
> Issue Type: New Feature
> Components: Distributed Coordination
> Affects Versions: 0.9
> Reporter: Robert Metzger
> Assignee: Chesnay Schepler
>
> Right now its not possible to analyze the job results for jobs that ran on
> YARN, because we'll loose the information once the JobManager has stopped.
> Therefore, I propose to implement a "Flink History Server" which serves the
> results from these jobs.
> I haven't started thinking about the implementation, but I suspect it
> involves some JSON files stored in HDFS :)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)