Hey Till, You've got a good point here. Removing some of the methods would cause breaking the stability guarantees. I do understand if we decide not to remove them for that reason, let me explain though why I am thinking it might make sense to remove them already. First of all I am a bit afraid it might take a long time before we arrive at the 2.0 version. We have not ever discussed that in the community. At the same time a lot of the methods already don't work or are buggy, and we do not fix them any more.
Methods which removing would not break the Public guarantees:
* ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
* RuntimeContext#getAllAccumulators (deprecated in 0.10)
* ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
*
StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
(not the equivalent in the ExecutionConfig)
* StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
(deprecated in 1.5)
Methods which removing would break the Public guarantees:
which have no effect:
* ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
* ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
which are buggy or discouraged and thus we do not support fixing them:
* DataStream#split (deprecated in 1.8)
* DataStream#fold and all related classes and methods such as
FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated
in 1.3/1.4)
The methods like:
*
StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),
* methods in (Connected)DataStream that specify keys as either indices
or field names
* ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries
should be working just fine and I feel the least eager to remove those.
I'd suggest I will open PRs for removing the methods that will not cause
breakage of the Public guarantees as the general feedback was rather
positive. For the rest I do understand the resentment to do so and will
not do it in the 1.x branch. Still I think it is valuable to have the
discussion.
Best,
Dawid
On 18/08/2020 09:26, Till Rohrmann wrote:
> Having looked at the proposed set of methods to remove I've noticed
> that some of them are actually annotated with @Public. According to
> our stability guarantees, only major releases (1.0, 2.0, etc.) can
> break APIs with this annotation. Hence, I believe that we cannot
> simply remove them unless the community decides to change the
> stability guarantees we give or by making the next release a major
> release (Flink 2.0).
>
> Cheers,
> Till
>
> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <[email protected]
> <mailto:[email protected]>> wrote:
>
> +1 for removing the methods that are deprecated for a while & have
> alternative methods.
>
> One specific thing is that if we remove the DataStream#split, do
> we consider enabling side-output in more operators in the future ?
> Currently it should be only available in ProcessFunctions, but not
> available to other commonly used UDF like Source or AsyncFunction[1].
>
> One temporary solution occurs to me is to add a ProcessFunction
> after the operators want to use side-output. But I think the
> solution is not very direct to come up with and if it really works
> we might add it to the document of side-output.
>
> [1] https://issues.apache.org/jira/browse/FLINK-7954
>
> Best,
> Yun
>
> ------------------Original Mail ------------------
> *Sender:*Kostas Kloudas <[email protected]
> <mailto:[email protected]>>
> *Send Date:*Tue Aug 18 03:52:44 2020
> *Recipients:*Dawid Wysakowicz <[email protected]
> <mailto:[email protected]>>
> *CC:*dev <[email protected] <mailto:[email protected]>>,
> user <[email protected] <mailto:[email protected]>>
> *Subject:*Re: [DISCUSS] Removing deprecated methods from
> DataStream API
>
> +1 for removing them.
>
>
>
> From a quick look, most of them (not all) have been
> deprecated a long time ago.
>
>
>
> Cheers,
>
> Kostas
>
>
>
> On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>
> >
>
> > @David Yes, my idea was to remove any use of fold method
> and all related classes including WindowedStream#fold
>
> >
>
> > @Klou Good idea to also remove the deprecated
> enableCheckpointing() &
> StreamExecutionEnvironment#readFile and alike. I did
> another pass over some of the classes and thought we could
> also drop:
>
> >
>
> > ExecutionConfig#set/getCodeAnalysisMode
>
> > ExecutionConfig#disable/enableSysoutLogging
>
> > ExecutionConfig#set/isFailTaskOnCheckpointError
>
> > ExecutionConfig#isLatencyTrackingEnabled
>
> >
>
> > As for the `forceCheckpointing` I am not fully convinced
> to doing it. As far as I know iterations still do not
> participate in checkpointing correctly. Therefore it still
> might make sense to force it. In other words there is no
> real alternative to that method. Unless we only remove the
> methods from StreamExecutionEnvironment and redirect to
> the setter in CheckpointConfig. WDYT?
>
> >
>
> > An updated list of methods I suggest to remove:
>
> >
>
> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>
> > ExecutionConfig#disable/enableSysoutLogging (deprecated
> in 1.10)
>
> > ExecutionConfig#set/isFailTaskOnCheckpointError
> (deprecated in 1.9)
>
> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>
> > ExecutionConfig#(get)/setNumberOfExecutionRetries()
> (deprecated in 1.1?)
>
> >
>
> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
> (deprecated in 1.2)
>
> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>
> > DataStream#fold and all related classes and methods such
> as FoldFunction, FoldingState, FoldingStateDescriptor ...
> (deprecated in 1.3/1.4)
>
> >
> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
> (deprecated in 1.5)
>
> > DataStream#split (deprecated in 1.8)
>
> > Methods in (Connected)DataStream that specify keys as
> either indices or field names such as DataStream#keyBy,
> DataStream#partitionCustom, ConnectedStream#keyBy, ....
> (deprecated in 1.11)
>
> >
>
> > Bear in mind that majority of the options listed above
> in ExecutionConfig take no effect. They were left there
> purely to satisfy the binary compatibility. Personally I
> don't see any benefit of leaving a method and silently
> dropping the underlying feature. The only configuration
> that is respected is setting the number of execution retries.
>
> >
>
> > I also wanted to make it explicit that most of the
> changes above would result in a binary incompatibility and
> require additional exclusions in the japicmp. Those are:
>
> >
>
> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>
> > ExecutionConfig#disable/enableSysoutLogging (deprecated
> in 1.10)
>
> > ExecutionConfig#set/isFailTaskOnCheckpointError
> (deprecated in 1.9)
>
> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>
> > ExecutionConfig#(get)/setNumberOfExecutionRetries()
> (deprecated in 1.1?)
>
> > DataStream#fold and all related classes and methods such
> as FoldFunction, FoldingState, FoldingStateDescriptor ...
> (deprecated in 1.3/1.4)
>
> > DataStream#split (deprecated in 1.8)
>
> > Methods in (Connected)DataStream that specify keys as
> either indices or field names such as DataStream#keyBy,
> DataStream#partitionCustom, ConnectedStream#keyBy, ....
> (deprecated in 1.11)
>
> >
>
> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
> (deprecated in 1.2)
>
> >
>
> > Looking forward to more opinions on the issue.
>
> >
>
> > Best,
>
> >
>
> > Dawid
>
> >
>
> >
>
> > On 17/08/2020 12:49, Kostas Kloudas wrote:
>
> >
>
> > Thanks a lot for starting this Dawid,
>
> >
>
> > Big +1 for the proposed clean-up, and I would also add
> the deprecated
>
> > methods of the StreamExecutionEnvironment like:
>
> >
>
> > enableCheckpointing(long interval, CheckpointingMode
> mode, boolean force)
>
> > enableCheckpointing()
>
> > isForceCheckpointing()
>
> >
>
> > readFile(FileInputFormat inputFormat,String
>
> > filePath,FileProcessingMode watchType,long interval,
> FilePathFilter
>
> > filter)
>
> > readFileStream(...)
>
> >
>
> > socketTextStream(String hostname, int port, char
> delimiter, long maxRetry)
>
> > socketTextStream(String hostname, int port, char delimiter)
>
> >
>
> > There are more, like the
> (get)/setNumberOfExecutionRetries() that were
>
> > deprecated long ago, but I have not investigated to see
> if they are
>
> > actually easy to remove.
>
> >
>
> > Cheers,
>
> > Kostas
>
> >
>
> > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>
> > wrote:
>
> >
>
> > Hi devs and users,
>
> >
>
> > I wanted to ask you what do you think about removing
> some of the deprecated APIs around the DataStream API.
>
> >
>
> > The APIs I have in mind are:
>
> >
>
> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>
> > DataStream#fold and all related classes and methods such
> as FoldFunction, FoldingState, FoldingStateDescriptor ...
> (deprecated in 1.3/1.4)
>
> >
> StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
> (deprecated in 1.5)
>
> > DataStream#split (deprecated in 1.8)
>
> > Methods in (Connected)DataStream that specify keys as
> either indices or field names such as DataStream#keyBy,
> DataStream#partitionCustom, ConnectedStream#keyBy, ....
> (deprecated in 1.11)
>
> >
>
> > I think the first three should be straightforward. They
> are long deprecated. The getAccumulators method is not
> used very often in my opinion. The same applies to the
> DataStream#fold which additionally is not very performant.
> Lastly the setStateBackend has an alternative with a class
> from the AbstractStateBackend hierarchy, therefore it will
> be still code compatible. Moreover if we remove the
> #setStateBackend(AbstractStateBackend) we will get rid off
> warnings users have right now when setting a statebackend
> as the correct method cannot be used without an explicit
> casting.
>
> >
>
> > As for the DataStream#split I know there were some
> objections against removing the #split method in the past.
> I still believe the output tags can replace the split
> method already.
>
> >
>
> > The only problem in the last set of methods I propose to
> remove is that they were deprecated only in the last
> release and those method were only partially deprecated.
> Moreover some of the methods were not deprecated in
> ConnectedStreams. Nevertheless I'd still be inclined to
> remove the methods in this release.
>
> >
>
> > Let me know what do you think about it.
>
> >
>
> > Best,
>
> >
>
> > Dawid
>
signature.asc
Description: OpenPGP digital signature
