[ 
https://issues.apache.org/jira/browse/FLINK-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116888#comment-14116888
 ] 

ASF GitHub Bot commented on FLINK-1025:
---------------------------------------

GitHub user warneke opened a pull request:

    https://github.com/apache/incubator-flink/pull/107

    [FLINK-1025] Integration of new BLOB service

    Please merge the new BLOB service. The service follows the design 
principles as discussed in [FLINK-1025]. The code contains unit tests for all 
relevant operations and has been successfully tested in a distributed 
environment.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/warneke/incubator-flink blob2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/107.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #107
    
----
commit 3669005c4d02949e56544016859d48c54a024f82
Author: Daniel Warneke <[email protected]>
Date:   2014-07-27T13:22:31Z

    Beginning of alternative BLOB service implementation

commit 2f6c2370249f9a1b11cef74043b070444e3dc9fb
Author: Daniel Warneke <[email protected]>
Date:   2014-07-27T13:43:55Z

    Merge branch 'master' into blob2

commit 678db0805132246645ae4d2b810c662991503d72
Author: Daniel Warneke <[email protected]>
Date:   2014-07-27T18:30:25Z

    Worked on implementation of alternative BLOB service

commit b0bfb1c6ce5ac0bb7ec3bfa96e19d7144f5663e0
Author: Daniel Warneke <[email protected]>
Date:   2014-08-10T19:05:41Z

    Merge branch 'master' into blob2

commit 1f1019001f3b9da3e58e344e02d82c02f1366a7d
Author: Daniel Warneke <[email protected]>
Date:   2014-08-10T19:34:47Z

    Started to integrate new BLOB service with library cache manager

commit 0751ce73e7f7bd696b91302a5ee7f0916aa3908f
Author: Daniel Warneke <[email protected]>
Date:   2014-08-10T20:00:12Z

    Started to integrate new blob service with job client

commit 2c8d0e96b89e692937309ab5f39d8644ec8e2f53
Author: Daniel Warneke <[email protected]>
Date:   2014-08-10T20:25:46Z

    Integrated blob service with job graph and job manager

commit 6dd0b5196ff609187141b4c6509a31ff70d99b5f
Author: Daniel Warneke <[email protected]>
Date:   2014-08-12T20:08:12Z

    Integrated lookup of BLOB server port

commit 146d2b66cd519d953cc7c94f28da615452dbac52
Author: Daniel Warneke <[email protected]>
Date:   2014-08-12T20:30:03Z

    Merged TaskDeploymentDescriptor from blob branch

commit 05f51cf8ec8fb3edec4f05613a39afa683d21e25
Author: Daniel Warneke <[email protected]>
Date:   2014-08-12T20:53:58Z

    Changed ExecutionGraph to work with BLOB keys

commit d447cdf8671393cad920db60459c37bb410e3761
Author: Daniel Warneke <[email protected]>
Date:   2014-08-12T21:05:37Z

    Fixed unit tests

commit 31e299e1fe6c1ee31253a399522f35eea3a43740
Author: Daniel Warneke <[email protected]>
Date:   2014-08-12T21:36:49Z

    Fixed unit test

commit e208bbe205e6faa589c0c6d9d3fea51c7b0f7fba
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T07:51:17Z

    Merge branch 'master' into blob2
    
    Conflicts:
        
flink-runtime/src/main/java/org/apache/flink/runtime/client/JobClient.java
        
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/JobManager.java

commit 5ec9be95da6c8de07fb551a43a02b11decd35241
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T08:42:02Z

    Fixed type serialization problem in ServiceDiscoveryProtocol

commit 15d53d2a89b59484b5efa902dd04427128f17628
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T08:46:05Z

    Improved upload code to avoid unnecessary TCP connection handshake

commit 46a74f37acaf0afcde4f9a233270d824329d364c
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T08:55:44Z

    Started to implement proper shut down of BLOB server

commit 2fcb560fafeb995356fb7aa65a1892ede96cffe9
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T10:00:08Z

    Introduced BLOB cache to locally cache content-addressable BLOBs

commit d5d26607aaaeb3956b8308d9c3a8eaf4c304dc5e
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T11:14:44Z

    Implemented lookup of BLOB server address for task managers

commit 8bb48f494f8127444891a86a4764bcc78e17d94c
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T12:12:31Z

    Added first implementation of BLOB cache

commit 492fe6f7ba8b059587e00139e86e5c3721e590ed
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T12:15:13Z

    Added Apache headers to BLOB service files

commit c70b80578fe30b77cac0ffae85634a0a88f2ba9c
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T12:24:57Z

    Added javadoc to BLOB cache

commit ed149e1685b2597aa707f694ccbe11b93ac393d3
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T12:56:00Z

    Added unit test for BLOB cache

commit b5572fd81d0b8aaa763d134cf004b9149e36618e
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T13:36:18Z

    Fixed bug in BLOB client

commit 7054ed59defd71675a214fdf7fa13525a7e07c71
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T13:57:47Z

    Completed unit test for the BLOB client

commit 2020ca127e7d8dd81a8b381958921fc7f6e3b640
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T14:03:37Z

    Added unit tests for the BlobKey class

commit a3dc55d1492481de271abb2dd2f0fe000d920411
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T16:01:24Z

    Implemented deletion and clean-up of BLOBs

commit 41d003f27a74df9f6d9636a2b7de014ce0374d8b
Author: Daniel Warneke <[email protected]>
Date:   2014-08-24T16:26:17Z

    Added javadoc for BLOB input stream

commit eb7600b8ffc44d77b9787fa539e35032f28b861d
Author: Daniel Warneke <[email protected]>
Date:   2014-08-30T11:50:03Z

    Merge branch 'master' into blob2

commit 6156d45b7a42a549f98fa16fe144bb5ed9488363
Author: Daniel Warneke <[email protected]>
Date:   2014-08-31T19:14:21Z

    Added javadoc for BLOB server and client

commit e665912ba1bb9560d8a6c25e4e8964af0f29d0db
Author: Daniel Warneke <[email protected]>
Date:   2014-08-31T20:16:32Z

    Added javadoc for the class BlobConnection

----


> Improve BLOB Service
> --------------------
>
>                 Key: FLINK-1025
>                 URL: https://issues.apache.org/jira/browse/FLINK-1025
>             Project: Flink
>          Issue Type: Improvement
>          Components: JobManager
>    Affects Versions: 0.6-incubating
>            Reporter: Stephan Ewen
>            Assignee: Daniel Warneke
>             Fix For: 0.7-incubating
>
>
> I like the idea of making it transparent where the blob service runs, so the 
> code on the server/client side is agnostic to that.
> The current merged code is in 
> https://github.com/StephanEwen/incubator-flink/commits/blobservice
> Local tests pass, I am trying distributed tests now.
> There are a few suggestions for improvements:
>  - Since the all the resources are bound to a job or session, it makes sense 
> to make all puts/gets relative to a jobId (becoming session id) and to have a 
> cleanup hook that delete all resources associated with that job.
>  - The BLOB service has hardwired to compute a message digest for the 
> contents, and to use that as the key. While it may make sense for jar files 
> (cached libraries), for many cases in the future, that will be unnecessary 
> and impose only overhead. I would vote to make this optional and allow just 
> UUIDs for keys. An example is for the taskmanager to put a part of an 
> intermediate result into the blob store, for the client to pick it up.
>  - At most points, we have started moving away from configured ports, because 
> of configuration overhead and collisions in setups, where multiple instances 
> end up on one machine. The latter happens actually frequently with YARN. I 
> would suggest to have the JM open a port dynamically for the BlobService 
> (similar as in TaskManager#getAvailablePort() ). RPC calls to figure out this 
> configuration need to happen only once between client/JM and TM/JM. We can 
> stomach that overhead ;-)
>  - The write method does not write the length a single time, but "per 
> buffer". Why is it done that way? The array-based methods know the length up 
> front, and when the contents comes from an input stream, I think we know the 
> length as well (for files: filesize, for network: sent up front).
>  - I am personally in favor of moving away from static singleton registries. 
> They tend to cause trouble during testing, pseudo cluster modes (multiple 
> workers within one JVM). How hard is it to have a BlobService at the 
> TaskManager / JobManager that we can pass as references to points where it is 
> needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to