[ 
https://issues.apache.org/jira/browse/FLINK-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske updated FLINK-1259:
---------------------------------
    Description: 
The FilterFunction returns a boolean for an input record which determines 
whether the record is filtered or not. 
However, the function can also modify the input record which has effects if the 
record is not filtered.

The optimizer assumes that the data is not changed by a FilterFunction, i.e., 
it assumes that a Filter preserves physical data properties (orders, 
partitionings, etc.) and might also be pushed down in the future. These 
assumptions can result in semantically incorrect programs, if the function 
actually changes its incoming records.

Possible solutions are:
- document the requirements (and hope that users read it and behave nicely)
- hand a copy to the function which can be modified but is not passed on. This 
has major performance implications and might confuse users as changes are 
invalidated. However, this could also be integrated with the mutable/immutable 
runtime switch (FLINK-1005)


  was:
The FilterFunction returns a boolean for an input record which determines 
whether the record is filtered or not. 
However, the function can also modify the input record which has effects if the 
record is not filtered.

The optimizer assumes that the data is not changed by a FilterFunction, i.e., 
it assumes that a Filter preserves physical data properties (orders, 
partitionings, etc.) and might also be pushed down in the future. These 
assumptions can result in semantically incorrect programs, if the function 
actually changes its incoming records.

Possible solutions are:
- document the requirements (and hope that users read it and behave nicely)
- hand a copy to the function which can be modified but is not passed on (might 
confuse users). However, this could also be integrated with the 
mutable/immutable runtime switch (FLINK-1005)



> FilterFunction can modify data
> ------------------------------
>
>                 Key: FLINK-1259
>                 URL: https://issues.apache.org/jira/browse/FLINK-1259
>             Project: Flink
>          Issue Type: Bug
>          Components: Java API, Optimizer, Scala API
>    Affects Versions: 0.7.0-incubating
>            Reporter: Fabian Hueske
>
> The FilterFunction returns a boolean for an input record which determines 
> whether the record is filtered or not. 
> However, the function can also modify the input record which has effects if 
> the record is not filtered.
> The optimizer assumes that the data is not changed by a FilterFunction, i.e., 
> it assumes that a Filter preserves physical data properties (orders, 
> partitionings, etc.) and might also be pushed down in the future. These 
> assumptions can result in semantically incorrect programs, if the function 
> actually changes its incoming records.
> Possible solutions are:
> - document the requirements (and hope that users read it and behave nicely)
> - hand a copy to the function which can be modified but is not passed on. 
> This has major performance implications and might confuse users as changes 
> are invalidated. However, this could also be integrated with the 
> mutable/immutable runtime switch (FLINK-1005)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to