[jira] [Comment Edited] (FLINK-5668) Reduce dependency on HDFS at job startup time

Bill Liu (JIRA) Mon, 27 Feb 2017 09:56:51 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886220#comment-15886220
 ]


Bill Liu edited comment on FLINK-5668 at 2/27/17 5:55 PM:
----------------------------------------------------------

[~rmetzger] 
 [~wheat9] and I are working on implementing a flink job deployer  for a Yarn 
with `HttpFs` and `S3`.
The Yarn Container could resolve the `http/s3`  file scheme. 

We use `HttpFs` instead of `HDFS` to bootstrap the JobManager
Here is the code to set up the AM container (JobManager)
```
    Path resourcePath = new Path("http://localhost:19989/flink-dist.jar";)
    FileStatus fileStatus = resourcePath.getFileSystem(yarnConfiguration)
            .getFileStatus(resourcePath);
    LOG.info("resource {}", ConverterUtils.getYarnUrlFromPath(resourcePath));
    LocalResource packageResource =
            LocalResource.newInstance(
                    ConverterUtils.getYarnUrlFromPath(resourcePath),
                    LocalResourceType.FILE, LocalResourceVisibility.APPLICATION,
                    fileStatus.getLen(), fileStatus.getModificationTime());
    LOG.info("add localresource {}", packageResource);
    localResources.put("flink.jar", packageResource);
   amContainer.setLocalResources(localResources);
```
`yarn.deploy.fs`  is not a goog idea, because these bootstrap jars/files may be 
located on different filesystem.
It's better to parse the jar Path to get the underneath filesystem of jar.



was (Author: bill.liu8904):
[~rmetzger] 
 [~wheat9]] and I are working on implementing a flink job deployer  for a Yarn 
with `HttpFs` and `S3`.
The Yarn Container could resolve the `http/s3`  file scheme. 

We use `HttpFs` instead of `HDFS` to bootstrap the JobManager
Here is the code to set up the AM container (JobManager)
```
    Path resourcePath = new Path("http://localhost:19989/flink-dist.jar";)
    FileStatus fileStatus = resourcePath.getFileSystem(yarnConfiguration)
            .getFileStatus(resourcePath);
    LOG.info("resource {}", ConverterUtils.getYarnUrlFromPath(resourcePath));
    LocalResource packageResource =
            LocalResource.newInstance(
                    ConverterUtils.getYarnUrlFromPath(resourcePath),
                    LocalResourceType.FILE, LocalResourceVisibility.APPLICATION,
                    fileStatus.getLen(), fileStatus.getModificationTime());
    LOG.info("add localresource {}", packageResource);
    localResources.put("flink.jar", packageResource);
   amContainer.setLocalResources(localResources);
```
`yarn.deploy.fs`  is not a goog idea, because these bootstrap jars/files may be 
located on different filesystem.
It's better to parse the jar Path to get the underneath filesystem of jar.


> Reduce dependency on HDFS at job startup time
> ---------------------------------------------
>
>                 Key: FLINK-5668
>                 URL: https://issues.apache.org/jira/browse/FLINK-5668
>             Project: Flink
>          Issue Type: Improvement
>          Components: YARN
>            Reporter: Bill Liu
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When create a Flink cluster on Yarn,  JobManager depends on  HDFS to share  
> taskmanager-conf.yaml  with TaskManager.
> It's better to share the taskmanager-conf.yaml  on JobManager Web server 
> instead of HDFS, which could reduce the HDFS dependency  at job startup.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (FLINK-5668) Reduce dependency on HDFS at job startup time

Reply via email to