Dear community,

last year Till did a great job on summarizing recent developments in the
Flink community in a "Weekly community update" thread. I found this very
helpful and would like to revive this tradition with a focus on topics &
threads which are particularly relevant to the wider community of Flink
users.

As we haven't had such an update for some time (since December 2018), I
find it impossible to cover everything that's currently going on in this
email. I'll try to include most ongoing discussions and FLIPs over the
course of the next weeks to catch up. Afterwards I am going to go back to
only focus on news since the last update.

You are welcome to share any additional news and updates with the community
in this thread.

Flink Development
===============

* [releases] The community is currently working on a Flink 1.8.1 release
[1]. The first release candidate should be ready soon (one critical bug to
fix as of writing, FLINK-12863).
* [releases] Kurt and Gordon stepped up as release managers for Flink 1.9
and started a thread [2] to sync on the status of various development
threads targeted for Flink 1.9. Check it out to see if the feature you are
waiting for is likely to make it or not.
* [savepoints] Gordon, Kostas and Congxian have recently started a
discussion [3] on unifying the savepoint format across StateBackends, which
will enable users to switch between StateBackends when recovering from a
Savepoint. The related discussion on introducing Stop-With-Checkpoint [4]
initiated by Yu Li is closely related and worth a read to understand the
long term vision.
* [savepoints] Seth and Gordon have started a discussion to add a State
Processing API ("Savepoint Connector"), which will allow reading &
modifying existing Savepoints as well as creating new Savepoints from
scratch with the DataSet API. The feature is targeted for Flink 1.9.0 as a
new *library*.
* [python-support] Back in April we had a discussion on the mailing list
about adding Python Support to the Table API [6]. This support will likely
be available in Flink 1.9 (without UDFs and later with UDF support as
well). Therefore, Stephan has started a discussion [7] to deprecate the
current Python API in Flink 1.9. This has gotten a lot of positive feedback
and the only open question as of writing is whether to only deprecate it or
to remove it directly.

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-8-1-td29154.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Features-for-Apache-Flink-1-9-0-td28701.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-41-Unify-Keyed-State-Snapshot-Binary-Format-for-Savepoints-td29197.html
[4] https://issues.apache.org/jira/browse/FLINK-12619
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-FLIP-43-Savepoint-Connector-td29232.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html#a28096
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Deprecate-previous-Python-APIs-td29483.html#a29522

Notable Bugs
===========

In this section I am going to list some recently discovered bugs, which
might be relevant to a larger audience. I'll try to explain them to the
best of my knowledge, but no guarantees.

* [FLINK-12296] [1.6.4] [1.7.2] [1.8.0] State can be silently lost when
recovering a job with two stateful Operators within the same Operator
Chain. This can only be the case when using reinterpretAsKeyedStream and
the bug only affects the RocksDBStatebackend with incremetal checkpointing.
Fixed in 1.7.3, 1.9.0 and 1.8.1. [8]
* [FLINK-12688] [1.6.4] [1.7.2] [1.8.0] A race condition while initializing
the TypeSerializer within a StateDescriptor could lead to rare
NullPointerExceptions when a StateDescriptor is shared between threads. Fixed
in 1.7.3, 1.9.0 and 1.8.1. [9]
* [FLINK-12653] [1.6.4] [1.7.2] 1.8.0 ] After rescaling a job recovery
might fail if some state was only registered in a subset of all Sub-Tasks.
This only affects the FileSystemStatebackend. Unresolved. [10]
* [FLINK-11820] [1.7.2] [1.8.0] The SimpleStringSchema of the
FlinkKafkaConsumer fails on "null" records. Unresolved, but PR available.
[11]
* [FLINK-11162] [1.6.4] [1.7.2] [ 1.8.0] Due to the way the checkpoint
directory is cleaned up by the CheckpointCoordinator tasks might fail
during materialization of a checkpoint if another task has previously
declined the same checkpoint already. The resolution is part of a larger
rework of how checkpoint failures are managed and seems to be targeted for
Flink 1.9.0. [12]
* [FLINK-10317] [1.6.4] [1.7.2] [1.8.0] Admittedly not a new bug, but still
unresolved and discussed: limiting the Java Metaspace size for Flink
processes by default. It is not clear right now whether limiting the
MetaspaceSize is a good idea. This ticket is a good starting point when
running into OOME Metapspace to look for tickets regarding classloader
leaks. [13]
* [FLINK-11107] [1.7.2] [1.8.1] When using the MemoryStateBackend with HA,
Flink creates many useless (since checkpoints are not externalized) random
checkpoint directories under the high-availability directory, which might
render the cluster eventually unusable. Fixed in 1.8.1 and 1.9.0. [14]

[8] https://issues.apache.org/jira/browse/FLINK-12296
[9] https://issues.apache.org/jira/browse/FLINK-12688
[10] https://issues.apache.org/jira/browse/FLINK-12653
[11] https://issues.apache.org/jira/browse/FLINK-11820
[12] https://issues.apache.org/jira/browse/FLINK-11662
[13] https://issues.apache.org/jira/browse/FLINK-10317
[14] https://issues.apache.org/jira/browse/FLINK-11107

Events, Blog Posts, Misc
====================

* Nico has recently published the first part [15] of a series of blogposts
on Flink's network stack.
* There are a couple of meetups coming up in the next weeks:
    * 2019/06/24: Cloud Native Meetup in *Aarhus* with a Flink talk by
Lasse Nedergard (TrackUnit) [16]
    * 2019/06/25: *Bay Area* Apache Flink Meetup with talks by Zendesk,
Parag Kesar and Ben Liu (Pinterest) and Ken Krugler (Scale Unlimited) [17]
    * 2019/07/01: *Paris *Apache Beam Meetup with a Flink talk by myself
(Ververica) [18]
    * 2019/07/05: Apache Flink Meetup *Munich* with talks by Steffen
Hausmann (AWS) and Michel David (Ryte) [19]

[15] https://flink.apache.org/2019/06/05/flink-network-stack.html
[16] https://www.meetup.com/Cloud-Native-Aarhus/events/261346897/
[17] https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/262216929
[18] https://www.meetup.com/Paris-Apache-Beam-Meetup/events/261775884/
[19] https://www.meetup.com/Apache-Flink-Meetup-Munich/events/261282757/

Any feedback or suggestions for this update thread are very much
appreciated.

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen

Reply via email to