[jira] [Commented] (FLINK-1579) Create a Flink History Server

ASF GitHub Bot (JIRA) Thu, 02 Mar 2017 10:00:06 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892693#comment-15892693
 ]


ASF GitHub Bot commented on FLINK-1579:
---------------------------------------

GitHub user zentol opened a pull request:

    https://github.com/apache/flink/pull/3460

    [FLINK-1579] Implement History Server

    This PR adds a slightly unpolished HistoryServer implementation. It is 
missing tests and some documentation, but is working.
    
    This PR builds on top of #3377.
    
    The basic idea is as follows:
    
    The ```MemoryArchivist```, upon receiving an ```ExecutionGraph```, writes a 
set of json files into a directory structure resembling the REST API using the 
features introduced in FLINK-5870, FLINK-5852 and FLINK-5941. The target 
location is configurable using ```job-manager.archive.dir```. Each job resides 
in it's own directory, using the job ID as the directory name. As such, each 
archive is consistent on it's own and multiple jobmanagers may use the same 
archive dir.
    
    The ```HistoryServer``` polls certain directories, configured via 
```historyserver.archive.dirs```, in regular intervals, configured via 
```historyserver.refresh-interval```, for new job archives. If a new archive is 
found it is downloaded and integrated into a cache of job archives in the local 
file system, configurable using ```historyserver.web.dir```. These files are 
served to a slightly modified WebFrontend using the 
```HistoryServerStaticFileServerHandler```.
    
    In the end the HistoryServer is little more than an aggregator and archive 
viewer.
    
    None of the directory configuration options have defaults; as it stands the 
entire feature is opt-in.
    
    Should a file that the WebFrontend requests be missing a separate fetch 
routine kicks in which attempts to fetch the missing file. This is primarily 
aimed at eventually-consistent file-systems.
    
    The HistoryServer is started using the new historyserver.sh script, which 
works similarly to job- or taskmanager scripts: ```./bin/historyserver.sh 
[start|stop]```
    
    2 bigger refactorings were made to existing code to increase the amount of 
shared code:
    * the netty setup in the WebRuntimeMonitor was moved into a separate 
NettySetup class which the HistoryServer can use as well
    * an AbstractStaticFileServerHandler was added which the 
(HistoryServer)StaticFileServerHandler extend

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink 1579_history_server_pr

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3460.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3460
    
----
commit 61a07456f151ac8f5418ac66629751e1a83ada3a
Author: zentol <[email protected]>
Date:   2017-01-24T09:13:24Z

    [FLINK-1579] Implement History Server - Frontend

commit e6316e544fea160f7d050dd1b087301a83345d31
Author: zentol <[email protected]>
Date:   2017-02-21T11:36:17Z

    [FLINK-5645] Store accumulators/metrics for canceled/failed tasks

commit 84fd2746b09ce41c2d9bd5be7f6e8a8cc1a3291d
Author: zentol <[email protected]>
Date:   2017-03-02T12:31:56Z

    Refactor netty setup into separate class

commit 81d7e6b92fe69326d6edf6b63f3f9c95f5ebd0ef
Author: zentol <[email protected]>
Date:   2017-02-22T14:47:07Z

    [FLINK-1579] Implement History Server - Backend

commit 8d1e8c59690ea97be4bbaf1a011c8ec4a68f5892
Author: zentol <[email protected]>
Date:   2017-03-02T11:09:36Z

    Rebuild frontend

----


> Create a Flink History Server
> -----------------------------
>
>                 Key: FLINK-1579
>                 URL: https://issues.apache.org/jira/browse/FLINK-1579
>             Project: Flink
>          Issue Type: New Feature
>          Components: Distributed Coordination
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>            Assignee: Chesnay Schepler
>
> Right now its not possible to analyze the job results for jobs that ran on 
> YARN, because we'll loose the information once the JobManager has stopped.
> Therefore, I propose to implement a "Flink History Server" which serves  the 
> results from these jobs.
> I haven't started thinking about the implementation, but I suspect it 
> involves some JSON files stored in HDFS :)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-1579) Create a Flink History Server

Reply via email to