NicoK commented on a change in pull request #13:
URL: https://github.com/apache/flink-training/pull/13#discussion_r469179715



##########
File path: long-ride-alerts/DISCUSSION.md
##########
@@ -21,7 +21,23 @@ under the License.
 
 (Discussion of [Lab: `ProcessFunction` and Timers (Long Ride Alerts)](./))
 
+It would be interesting to test that the solution does not leak state.
 
+A good way to write unit tests for a `KeyedProcessFunction` that check for 
state retention, etc., is to
+use the test harnesses described in the
+[documentation on 
testing](https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html#unit-testing-stateful-or-timely-udfs--custom-operators).
 
+
+In fact, the reference solutions will leak state in the case where a START 
event is missing. They also
+leak in the case where the alert is generated, but then the END event does 
eventually arrive (after `onTimer()`
+has cleared the matching START event).
+
+This could be addressed either by using [state 
TTL](https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl),
+or by using another timer that eventually
+clears any remaining state. There is a tradeoff here, however: once that state 
has been removed,
+then if the matching events are't actually missing, but are instead very, very 
late, they will cause erroneous alerts.

Review comment:
       ```suggestion
   then if the matching events are not actually missing, but are instead very, 
very late, they will cause erroneous alerts.
   ```

##########
File path: long-ride-alerts/DISCUSSION.md
##########
@@ -21,7 +21,23 @@ under the License.
 
 (Discussion of [Lab: `ProcessFunction` and Timers (Long Ride Alerts)](./))
 
+It would be interesting to test that the solution does not leak state.
 
+A good way to write unit tests for a `KeyedProcessFunction` that check for 
state retention, etc., is to

Review comment:
       ```suggestion
   A good way to write unit tests for a `KeyedProcessFunction` to check for 
state retention, etc., is to
   ```

##########
File path: long-ride-alerts/README.md
##########
@@ -62,13 +62,11 @@ The resulting stream should be printed to standard out.
 <details>
 <summary><strong>Overall approach</strong></summary>
 
-This exercise revolves around using a `ProcessFunction` to manage some keyed 
state and event time timers, and doing so in a way that works even when the END 
event for a given `rideId` arrives before the START (which will happen). The 
challenge is figuring out what state to keep, and when to set and clear that 
state.
-</details>
-
-<details>
-<summary><strong>Timers and State</strong></summary>
-
-You will want to use event time timers that fire two hours after the incoming 
events, and in the `onTimer()` method, collect START events to the output only 
if a matching END event hasn't yet arrived. As for what state to keep, it is 
enough to remember the "last" event for each `rideId`, where "last" is based on 
event time and ride type (START vs END &mdash; yes, there are rides where the 
START and END have the same timestamp), rather than the order in which the 
events are processed. The `TaxiRide` class implements `Comparable`; feel free 
to take advantage of that, and be sure to eventually clear any state you create.

Review comment:
       Why did you remove the hints on what state to keep?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to