GitHub user ChrisChinchilla opened a pull request:
https://github.com/apache/flink/pull/5023
[hotfix][docs] Review of concepts docs for grammar, formatting etc
Spending some time doing a brief review of a few docs sections, just a
startâ¦
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ChrisChinchilla/flink concepts-review
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5023.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5023
----
commit 77946f1d8772f2c2c095e93d5a76c8df42fbe790
Author: Chris Ward <[email protected]>
Date: 2017-10-02T13:20:50Z
Polishing grammar and tone for consistency and clarity
commit e5eafba7053eb41f0bcd4e177ed1fdae7c2c3e66
Author: Chris Ward <[email protected]>
Date: 2017-10-05T18:26:21Z
Clarify support of older APIs
commit 027914715744e673e41893067beec58b52a177ce
Author: Chris Ward <[email protected]>
Date: 2017-10-12T13:44:28Z
Begin fixes for voicing, passive voice, weasel words etcâ¦
commit 05cb3f04d332886aafc2b5675358aed316f76826
Author: Nico Kruber <[email protected]>
Date: 2017-07-11T09:19:20Z
[FLINK-7068][blob] change BlobService sub-classes for permanent and
transient BLOBs
[FLINK-7068][blob] start introducing a new BLOB storage abstraction
This is incomplete and may not compile and/or run tests successfully yet.
[FLINK-7068][blob] remove BlobView from TransientBlobCache
The transient BLOB cache is not supposed to work with the HA store since it
only
serves non-HA files.
[FLINK-7068][blob] remove unnecessary use of BlobClient
[FLINK-7068][blob] implement TransientBlobCache#put methods
[FLINK-7068][blob] remove further unnecessary use of BlobClient and adapt
to HA get/put methods
[FLINK-7068][blob] fix BlobServer#getFileInternal not being guarded by locks
[FLINK-7068][blob] add incoming file cleanup at BlobServer in cases of
errors
[FLINK-7068] fix missing BlobServer#putHA() jobId propagation
[FLINK-7068][blob] remove BlobClient use from BlobServer{Get|Put}Test
[FLINK-7068][blob] make helper methods work with any BlobService
[FLINK-7068][blob] start adding a BlobCacheGetTest
[FLINK-7068][blob] verify get contents in separate threads
This allows (at a slight chance) that we may see an intermediate file.
[FLINK-7068][blob] better locking granularity during file retrieval
This allows multiple parallel downloads from the HA store to the
BlobServer's
local store although only one of these downloaded staging files will
actually
be used. In practice, this happens only during recovery and not in parallel
anyways.
[FLINK-7068][blob] share more code among BlobServer and BlobServerConnection
This also applies the better locking granularity of the previous commit to
BlobServerConnection.
[FLINK-7068][blob] properly cleanup temporary staging files in all cases
[FLINK-7068][blob] make PermanentBlobCache and TransientBlobCache
thread-safe
[FLINK-7068][tests] improve various tests
[FLINK-7068][blob] change the signature of the delete calls to return
success
We will not throw exceptions in case of failures anymore and return whether
the
operation was successful instead. Failure details will still be accessible
in
the written logs.
[FLINK-7068][tests] extend and adapt BlobServerDeleteTest
[FLINK-7068][tests] adapt further BlobCache tests
[FLINK-7068][tests] adapt BlobClientTest
[FLINK-7068][blob] cleanup BlobClient methods
BlobClient is not supposed to be used by anyone else than the
BlobServer/BlobCache classes. Most accessors were already package-private,
now
remove the ones that just blow up the code.
[FLINK-7068] add a TODO to fix the currently failing tests
[FLINK-7068][tests] add a BlobCacheRecoveryTest
This currently fails due to TransientBlobCache#put also storing files in HA
store which it should not!
[FLINK-7068][tests] improve failure message
[FLINK-7068][blob] add permanent/transient BLOB modes to BlobClient
This allows a better control of which should end up in HA store and which
should
not. Also, during GET methods, we do not check the HA store unnecessarily.
[FLINK-7068][tests] extend the Blob{Server|Cache}GetTest
This adds some failing GET operations and verifies that the files are
cleaned
up accordingly.
[FLINK-7068][blob] remove "final" flag from BlobCache class
This re-enables mocking in various unit tests.
[FLINK-7068][tests] fix test relying on order of folder contents
[FLINK-7068][blob] some BlobServer cleanup
[FLINK-7068][hotfix] fix checkstyle errors
[FLINK-7068][tests] fix tests now requiring a more complete BlobCache mock
A suitable BlobCache mock should at least return a mock for a permanent and
a
transient BLOB store, so mock(BlobCache.class) is not sufficient anymore.
[FLINK-7068] final wrap up
* remove a left-over TODO
* remove useless tests for the concurrency of the GET operations (we cannot
test
that the file write is guarded by a lock directly - rely on the concurrent
checks in the individual threads instead)
* fix some log messages
[FLINK-7068][blob] remove Thread#start() call from BlobServer constructor
This is bad design and limits extensibility, e.g. in tests like the
BlobCacheRetriesTest where this caused a race condition with the sub-class.
Instead, the user must now call BlobServer#start() explicitely.
[FLINK-7068][tests] remove unused imports
[FLINK-7068][tests] fix a typo
[FLINK-7068][tests] add some tests that verify behaviour with corrupted
files
Also add corruption checks for HA-store downloads which was not implemented
yet.
[FLINK-7068][blob] ensure consistency in PermanentBlobCache even in cases
of invalid use
During cleanup, no write lock was taken but the storage directory of an
(unused!) job was deleted. Normally, there should be no process left
accessing
its data and no new process can jump in since the registration is locked. In
case of invalid use cases, i.e. using a job's data outside a register() and
release() block, this could lead to strange effects.
By guarding the cleanup with the write lock as well, we circumvent that.
[FLINK-7068][hotfix] remove an unused import
commit c7d4562db8af446e79967215cda597be37261eed
Author: Nico Kruber <[email protected]>
Date: 2017-07-25T11:00:33Z
[FLINK-7261][blob] extend BlobStore#get/put with boolean return values
This way, using code can distinguish non-HA cases, i.e. VoidBlobStore, from
HA cases, i.e. FileSystemBlobStore, in a general way and have better error
reporting.
commit d7ad3aed56ca72139760f04af541b061b3107cca
Author: Nico Kruber <[email protected]>
Date: 2017-08-07T16:12:28Z
[FLINK-7412][network] optimise NettyMessage.TaskEventRequest#readFrom() to
read from netty buffers directly
This closes #4518.
commit f3ef92fca6f6b572a29e5b2b37caa2d669a06f20
Author: Nico Kruber <[email protected]>
Date: 2017-08-18T10:28:23Z
[FLINK-7057][tests][hotfix] fix test instability of
JobManagerCleanupITCase#testBlobServerCleanupCancelledJob
This test expected two messages to arrice (job cancellation and job state
change
notification) but did not take different receive orders into account. The
fix:
- removes state change listening for this test case so that only one message
arrives, and
- adds message comparison by object, not just class (to improve debugging)
commit b2d2d584d9f1849bf81fbe09ee81ca44efacc421
Author: Nico Kruber <[email protected]>
Date: 2017-08-21T08:36:56Z
[FLINK-7483][blob] prevent cleanup of re-registered jobs
When a job is registered, it may have been released before and we thus need
to
reset the cleanup timeout again.
commit a1a1cf59c0c677b593934b17c239e6832440c56d
Author: Fabian Hueske <[email protected]>
Date: 2017-09-10T22:05:06Z
[FLINK-7446] [table] Change DefinedRowtimeAttribute to work on existing
field.
This closes #4710.
commit 404779108319d1b4ead02bb4af1fb979ba1824dd
Author: Nico Kruber <[email protected]>
Date: 2017-09-20T10:05:25Z
[FLINK-7068][blob] Introduce permanent and transient BLOB keys
[FLINK-7068][blob] address PR review comments, part 1
[FLINK-7068][blob] create a common base class for the BLOB caches
[FLINK-7068][blob] update some comments
[FLINK-7068][blob] integrate the BLOB type into the BlobKey
[FLINK-7068][blob] rename a few methods for better consistency
[FLINK-7068][blob] fix Blob*DeleteTest not working as documented in one test
[FLINK-7068][blob] add checks for jobId being null in PermanentBlobCache
[FLINK-7068][blob] implement get-and-delete logic for transient BLOBs
Transient BLOB files are deleted on the BlobServer upon first access from a
cache. Therefore, we do not need the DELETE operations anymore, aside from
deleting the file from the local cache (for now).
[FLINK-7068][blob] address PR comments, part 2
[FLINK-7068][blob] separate permanent and transient BLOB keys
* create PermanentBlobKey and TransientBlobKey (inheriting from BlobKey) and
forbid using transient BLOBs with permanent caches and vice versa
* make BlobKey package-private, similarly for the BlobType which is now
reflected by the two BlobKey sub-classes
-> this gives a cleaner interface for the user
This closes #4358.
commit 1546277f7e2351f83efeb6e57c04e5b5c0323c6a
Author: Till Rohrmann <[email protected]>
Date: 2017-09-25T13:29:59Z
[FLINK-7668] Add ExecutionGraphCache for ExecutionGraph based REST handlers
The ExecutionGraphCache replaces the ExecutionGraphHolder. Unlike the
latter, the former
does not expect the AccessExecutionGraph to be the true ExecutionGraph.
Instead it assumes
it to be the ArchivedExecutionGraph. Therefore, it invalidates the cache
entries after
a given time to live period. This will trigger requesting the
AccessExecutionGraph again
and, thus, updating the ExecutionGraph information for the ExecutionGraph
based REST
handlers.
In order to avoid memory leaks, the WebRuntimeMonitor starts now a periodic
cleanup task
which triggers ExecutionGraphCache.cleanup. This methods releases all cache
entries which
have exceeded their time to live. Currently it is set to 20 *
refreshInterval of the
web gui.
This closes #4728.
commit 63f003be24ddb5190a1f2d4b5b710549c285e848
Author: Till Rohrmann <[email protected]>
Date: 2017-09-26T16:39:15Z
[FLINK-7695] [flip6] Add JobConfigHandler for new RestServerEndpoint
This closes #4737.
commit bd55021374380f811b5f338fa6f30588635f84f4
Author: Tzu-Li (Gordon) Tai <[email protected]>
Date: 2017-09-27T18:05:21Z
[FLINK-7721] [DataStream] Only emit new min watermark iff it was aggregated
from watermark-aligned inputs
Prior to this commit, In the calculation of the new min watermark in
StatusWatermarkValve#findAndOutputNewMinWatermarkAcrossAlignedChannels(),
there is no verification that the calculated new min watermark
really is aggregated from some aligned channel.
In the corner case where all input channels are currently not aligned
but actually some are active, we would then incorrectly determine that
the final aggregation is Long.MAX_VALUE and emit that.
This commit fixes this by only emitting the aggregated watermark iff it was
really calculated from some aligned input channel (as well as the
already existing constraint that it needs to be larger than the last
emitted watermark). This change should also safely cover the case that a
Long.MAX_VALUE was genuinely aggregated from the input channels.
commit 3b457321f39803b955ffc3d01c9d862ccb7a4769
Author: Tzu-Li (Gordon) Tai <[email protected]>
Date: 2017-09-28T12:11:22Z
[FLINK-7728] [DataStream] Simplify BufferedValveOutputHandler used in
StatusWatermarkValveTest
The previous implementation was overly complicated. Having separate
buffers for the StreamStatus and Watermarks is not required for our
tests. Also, that design doesn't allow checking the order StreamStatus /
Watermarks are emitted from a single input to the valve.
This commit reworks it by buffering both StreamStatus and Watermarks in
a shared queue.
commit 9538eae0312c74a1e84fa4088fc09154b9290c5e
Author: Tzu-Li (Gordon) Tai <[email protected]>
Date: 2017-09-28T14:49:22Z
[FLINK-7728] [DataStream] Flush max watermark across all inputs once all
become idle
Prior to this commit, once all inputs of the StatusWatermarkValve
becomes idle, we only emit the StreamStatus.IDLE marker, and check
nothing else. This makes the watermark advancement behaviour
inconsistent in the case that all inputs become idle depending on the
order that they become idle.
This commit fixes this by "flushing" the max watermark across all
channels once all inputs become idle. At a high-level, what this means
for downstream operators is that all inputs have become idle and will
temporariliy cease to advance their watermarks, so they can safely
advance their event time to whatever the largest watermark is.
commit f14dfd04c748e26e8f0c59146a8dd209871404fc
Author: Till Rohrmann <[email protected]>
Date: 2017-09-28T16:35:50Z
[FLINK-7708] [flip6] Add CheckpointConfigHandler for new REST endpoint
This commit implements the CheckpointConfigHandler which now returns a
CheckpointConfigInfo object if checkpointing is enabled. In case that
checkpointing is disabled for a job, it will return a 404 response.
This closes #4744.
commit 85f5168f58e9a46e1009ce79f4285245199612cb
Author: Tzu-Li (Gordon) Tai <[email protected]>
Date: 2017-09-29T09:16:22Z
[FLINK-7728] [DataStream] Make StatusWatermarkValve unit tests more
fine-grained
Previously, the unit tests in StatusWatermarkValveTest were too
cluttered and testing too many behaviours in a single test. This makes
it hard to have a good overview of what test cases are covered.
This commit is a rework of the previous tests, making them more
fine-grained so that the scope of each test is small enough. All
previously tested behaviours are still covered.
commit 32d90137e3984ce067761e0d18707cc65992131d
Author: Till Rohrmann <[email protected]>
Date: 2017-09-29T13:09:06Z
[FLINK-7710] [flip6] Add CheckpointStatisticsHandler for the new REST
endpoint
This commit also makes the CheckpointStatsHistory object serializable by
removing the
CheckpointStatsHistoryIterable and replacing it with a static ArrayList.
This closes #4750.
commit 9f773c3f60e31569023087454a71e660c3d669cd
Author: Aljoscha Krettek <[email protected]>
Date: 2017-10-02T10:28:10Z
[FLINK-7721] Verify StatusWatermarkValve only emits WM iff it has aligned
inputs
commit ebc01c7d7e693666bad4a6d65b126f22dc264812
Author: Stephan Ewen <[email protected]>
Date: 2017-10-02T12:15:14Z
[hotfix] [core] Prevent potential null pointer in MemorySize.equals(...)
commit 2aa618a2bab4260fc3d46b7a52f68450d0f34626
Author: Stephan Ewen <[email protected]>
Date: 2017-10-02T12:34:27Z
[FLINK-7643] [core] Misc. cleanups in FileSystem
- Simplify access to local file system
- Use a fair lock for all FileSystem.get() operations
- Robust falback to local fs for default scheme (avoids URI parsing error
on Windows)
- Deprecate 'getDefaultBlockSize()'
- Deprecate create(...) with block sizes and replication factor, which is
not applicable to many FS
commit d186c169591c36a6fca8dbb5d40ff2d5815ed810
Author: Stephan Ewen <[email protected]>
Date: 2017-10-02T14:25:18Z
[FLINK-7643] [core] Rework FileSystem loading to use factories
This makes sure that configurations are loaded once and file system
instances are
properly reused by scheme and authority.
This also factors out a lot of the special treatment of Hadoop file systems
and simply
makes the Hadoop File System factory the default fallback factory.
commit 0f3fcb8967a95d31f383f8b4d895869b3d2b8b30
Author: Stephan Ewen <[email protected]>
Date: 2017-10-02T14:30:07Z
[FLINK-7643] [core] Drop eager checks for file system support.
Some places validate if the file URIs are resolvable on the client. This
leads to
problems when file systems are not accessible from the client, when the
full libraries for
the file systems are not present on the client (for example often the case
in cloud setups),
or when the configuration on the client is different from the
nodes/containers that will
execute the application.
commit 1479b72ba3f11a985f977ee568d72e25a97ab450
Author: Till Rohrmann <[email protected]>
Date: 2017-10-02T20:29:12Z
[FLINK-7754] [rpc] Complete termination future after actor has been stopped
This commit waits not only until the Actor has called postStop but also
until the actor
has been completely shut down by the ActorSystem before completing the
termination
future.
This closes #4770.
commit 8fbade07443dd2ca3d408ed61485d3dcaf86415e
Author: zentol <[email protected]>
Date: 2017-10-02T20:58:21Z
[hotfix] [dispatcher] Remove leftover javadoc from DispatcherGateway
commit 56aac28c614462a119edba13461b8de11ea9fbc9
Author: Stephan Ewen <[email protected]>
Date: 2017-10-02T21:06:07Z
[hotfix] Fix testing log level in flink-runtime
commit f9b726df9bb86b41d0c903a36ed2e318ae94ce62
Author: Wright, Eron <[email protected]>
Date: 2017-10-04T03:36:09Z
[FLINK-7752] [flip-6] RedirectHandler should execute on the IO thread
This closes #4766.
commit 5d631dd79810e571c9baff2f3079084380e196bd
Author: Fabian Hueske <[email protected]>
Date: 2017-10-04T11:01:22Z
[hotfix] [hbase] Set root log level to OFF for flink-hbase tests.
Log level is changed due to a buggy Calcite check that causes a NPE.
The check is only performed if log level DEBUG is enabled.
This closes #4771
commit 1b17941b31831ecfcad9b1d5bc1505415ac66306
Author: Stephan Ewen <[email protected]>
Date: 2017-10-05T09:26:13Z
[FLINK-7767] [file system sinks] Avoid loading Hadoop conf dynamically at
runtime
commit c6cdd4172107985a4160869d6af6ddc0ade44fbf
Author: Stephan Ewen <[email protected]>
Date: 2017-10-05T09:27:48Z
[FLINK-7766] [file system sink] Drop obsolete reflective hflush calls
This was done reflectively before for Hadoop 1 compatibility.
Since Hadoop 1 is no longer supported, this is obsolete now.
----
---