[
https://issues.apache.org/jira/browse/FLINK-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116888#comment-14116888
]
ASF GitHub Bot commented on FLINK-1025:
---------------------------------------
GitHub user warneke opened a pull request:
https://github.com/apache/incubator-flink/pull/107
[FLINK-1025] Integration of new BLOB service
Please merge the new BLOB service. The service follows the design
principles as discussed in [FLINK-1025]. The code contains unit tests for all
relevant operations and has been successfully tested in a distributed
environment.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/warneke/incubator-flink blob2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-flink/pull/107.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #107
----
commit 3669005c4d02949e56544016859d48c54a024f82
Author: Daniel Warneke <[email protected]>
Date: 2014-07-27T13:22:31Z
Beginning of alternative BLOB service implementation
commit 2f6c2370249f9a1b11cef74043b070444e3dc9fb
Author: Daniel Warneke <[email protected]>
Date: 2014-07-27T13:43:55Z
Merge branch 'master' into blob2
commit 678db0805132246645ae4d2b810c662991503d72
Author: Daniel Warneke <[email protected]>
Date: 2014-07-27T18:30:25Z
Worked on implementation of alternative BLOB service
commit b0bfb1c6ce5ac0bb7ec3bfa96e19d7144f5663e0
Author: Daniel Warneke <[email protected]>
Date: 2014-08-10T19:05:41Z
Merge branch 'master' into blob2
commit 1f1019001f3b9da3e58e344e02d82c02f1366a7d
Author: Daniel Warneke <[email protected]>
Date: 2014-08-10T19:34:47Z
Started to integrate new BLOB service with library cache manager
commit 0751ce73e7f7bd696b91302a5ee7f0916aa3908f
Author: Daniel Warneke <[email protected]>
Date: 2014-08-10T20:00:12Z
Started to integrate new blob service with job client
commit 2c8d0e96b89e692937309ab5f39d8644ec8e2f53
Author: Daniel Warneke <[email protected]>
Date: 2014-08-10T20:25:46Z
Integrated blob service with job graph and job manager
commit 6dd0b5196ff609187141b4c6509a31ff70d99b5f
Author: Daniel Warneke <[email protected]>
Date: 2014-08-12T20:08:12Z
Integrated lookup of BLOB server port
commit 146d2b66cd519d953cc7c94f28da615452dbac52
Author: Daniel Warneke <[email protected]>
Date: 2014-08-12T20:30:03Z
Merged TaskDeploymentDescriptor from blob branch
commit 05f51cf8ec8fb3edec4f05613a39afa683d21e25
Author: Daniel Warneke <[email protected]>
Date: 2014-08-12T20:53:58Z
Changed ExecutionGraph to work with BLOB keys
commit d447cdf8671393cad920db60459c37bb410e3761
Author: Daniel Warneke <[email protected]>
Date: 2014-08-12T21:05:37Z
Fixed unit tests
commit 31e299e1fe6c1ee31253a399522f35eea3a43740
Author: Daniel Warneke <[email protected]>
Date: 2014-08-12T21:36:49Z
Fixed unit test
commit e208bbe205e6faa589c0c6d9d3fea51c7b0f7fba
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T07:51:17Z
Merge branch 'master' into blob2
Conflicts:
flink-runtime/src/main/java/org/apache/flink/runtime/client/JobClient.java
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/JobManager.java
commit 5ec9be95da6c8de07fb551a43a02b11decd35241
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T08:42:02Z
Fixed type serialization problem in ServiceDiscoveryProtocol
commit 15d53d2a89b59484b5efa902dd04427128f17628
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T08:46:05Z
Improved upload code to avoid unnecessary TCP connection handshake
commit 46a74f37acaf0afcde4f9a233270d824329d364c
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T08:55:44Z
Started to implement proper shut down of BLOB server
commit 2fcb560fafeb995356fb7aa65a1892ede96cffe9
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T10:00:08Z
Introduced BLOB cache to locally cache content-addressable BLOBs
commit d5d26607aaaeb3956b8308d9c3a8eaf4c304dc5e
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T11:14:44Z
Implemented lookup of BLOB server address for task managers
commit 8bb48f494f8127444891a86a4764bcc78e17d94c
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T12:12:31Z
Added first implementation of BLOB cache
commit 492fe6f7ba8b059587e00139e86e5c3721e590ed
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T12:15:13Z
Added Apache headers to BLOB service files
commit c70b80578fe30b77cac0ffae85634a0a88f2ba9c
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T12:24:57Z
Added javadoc to BLOB cache
commit ed149e1685b2597aa707f694ccbe11b93ac393d3
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T12:56:00Z
Added unit test for BLOB cache
commit b5572fd81d0b8aaa763d134cf004b9149e36618e
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T13:36:18Z
Fixed bug in BLOB client
commit 7054ed59defd71675a214fdf7fa13525a7e07c71
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T13:57:47Z
Completed unit test for the BLOB client
commit 2020ca127e7d8dd81a8b381958921fc7f6e3b640
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T14:03:37Z
Added unit tests for the BlobKey class
commit a3dc55d1492481de271abb2dd2f0fe000d920411
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T16:01:24Z
Implemented deletion and clean-up of BLOBs
commit 41d003f27a74df9f6d9636a2b7de014ce0374d8b
Author: Daniel Warneke <[email protected]>
Date: 2014-08-24T16:26:17Z
Added javadoc for BLOB input stream
commit eb7600b8ffc44d77b9787fa539e35032f28b861d
Author: Daniel Warneke <[email protected]>
Date: 2014-08-30T11:50:03Z
Merge branch 'master' into blob2
commit 6156d45b7a42a549f98fa16fe144bb5ed9488363
Author: Daniel Warneke <[email protected]>
Date: 2014-08-31T19:14:21Z
Added javadoc for BLOB server and client
commit e665912ba1bb9560d8a6c25e4e8964af0f29d0db
Author: Daniel Warneke <[email protected]>
Date: 2014-08-31T20:16:32Z
Added javadoc for the class BlobConnection
----
> Improve BLOB Service
> --------------------
>
> Key: FLINK-1025
> URL: https://issues.apache.org/jira/browse/FLINK-1025
> Project: Flink
> Issue Type: Improvement
> Components: JobManager
> Affects Versions: 0.6-incubating
> Reporter: Stephan Ewen
> Assignee: Daniel Warneke
> Fix For: 0.7-incubating
>
>
> I like the idea of making it transparent where the blob service runs, so the
> code on the server/client side is agnostic to that.
> The current merged code is in
> https://github.com/StephanEwen/incubator-flink/commits/blobservice
> Local tests pass, I am trying distributed tests now.
> There are a few suggestions for improvements:
> - Since the all the resources are bound to a job or session, it makes sense
> to make all puts/gets relative to a jobId (becoming session id) and to have a
> cleanup hook that delete all resources associated with that job.
> - The BLOB service has hardwired to compute a message digest for the
> contents, and to use that as the key. While it may make sense for jar files
> (cached libraries), for many cases in the future, that will be unnecessary
> and impose only overhead. I would vote to make this optional and allow just
> UUIDs for keys. An example is for the taskmanager to put a part of an
> intermediate result into the blob store, for the client to pick it up.
> - At most points, we have started moving away from configured ports, because
> of configuration overhead and collisions in setups, where multiple instances
> end up on one machine. The latter happens actually frequently with YARN. I
> would suggest to have the JM open a port dynamically for the BlobService
> (similar as in TaskManager#getAvailablePort() ). RPC calls to figure out this
> configuration need to happen only once between client/JM and TM/JM. We can
> stomach that overhead ;-)
> - The write method does not write the length a single time, but "per
> buffer". Why is it done that way? The array-based methods know the length up
> front, and when the contents comes from an input stream, I think we know the
> length as well (for files: filesize, for network: sent up front).
> - I am personally in favor of moving away from static singleton registries.
> They tend to cause trouble during testing, pseudo cluster modes (multiple
> workers within one JVM). How hard is it to have a BlobService at the
> TaskManager / JobManager that we can pass as references to points where it is
> needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)