alxp1982 commented on code in PR #25258: URL: https://github.com/apache/beam/pull/25258#discussion_r1107767973
########## learning/tour-of-beam/learning-content/windowing/motivating-challenge/description.md: ########## @@ -0,0 +1,25 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +### Common Transforms motivating challenge + +You are provided with a `PCollection` from the array of the taxi order price in a csv file. Your task is to launch a new window every 5 seconds that covers data in 10 seconds. And derive the maximum price from them. Review Comment: For this exercise, you can use NYC taxi trips csv containing a list of trips and their prices. Following is a small list of fields and an example record from this dataset: | cost | passenger_count | ... | |------|-----------------|-----| | 5.8 | 1 | ... | | 4.6 | 2 | ... | | 24 | 1 | ... | Your task is to write a pipeline that returns the maximum price of taxi trips for the past 10 min. Calculations need to be updated every minute. ########## learning/tour-of-beam/learning-content/windowing/motivating-challenge/description.md: ########## @@ -0,0 +1,25 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +### Common Transforms motivating challenge Review Comment: Why 'Common Transform' if this is a 'Windowing' module? Please replace 'challenge' with exercise. ########## learning/tour-of-beam/learning-content/windowing/sliding-time-window/description.md: ########## @@ -0,0 +1,70 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +### Sliding time windows + +A sliding time window also represents time intervals in the data stream; however, sliding time windows can overlap. For example, each window might capture 60 seconds worth of data, but a new window starts every 30 seconds. The frequency with which sliding windows begin is called the period. Therefore, our example would have a window duration of 60 seconds and a period of 30 seconds. + +Because multiple windows overlap, most elements in a data set will belong to more than one window. This kind of windowing is useful for taking running averages of data; using sliding time windows, you can compute a running average of the past 60 seconds’ worth of data, updated every 30 seconds. Review Comment: Because multiple windows overlap, most elements in a data set will belong to more than one window. This kind of windowing is helpful for taking **running data averages**; using sliding time windows, you can compute a running average of the past 60 seconds’ worth of data, updated every 30 seconds. ########## learning/tour-of-beam/learning-content/windowing/sliding-time-window/description.md: ########## @@ -0,0 +1,70 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +### Sliding time windows + +A sliding time window also represents time intervals in the data stream; however, sliding time windows can overlap. For example, each window might capture 60 seconds worth of data, but a new window starts every 30 seconds. The frequency with which sliding windows begin is called the period. Therefore, our example would have a window duration of 60 seconds and a period of 30 seconds. Review Comment: A sliding time window also represents time intervals in the data stream; however, sliding time windows can overlap. For example, each window might capture 60 seconds' worth of data, but a new window starts every 30 seconds. The frequency with which sliding windows begin is called the period. Therefore, our example would have a window duration of 60 seconds and a period of 30 seconds. ########## learning/tour-of-beam/learning-content/windowing/motivating-challenge/hint1.md: ########## @@ -0,0 +1,38 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +To solve this challenge, you may build a pipeline that consists of the following steps: +{{if (eq .Sdk "go")}} +1. Add windowing `slidingWindowedItems := beam.WindowInto(s, window.NewSlidingWindows(5*time.Second, 10*time.Second), input)` Review Comment: Add sliding windowing ########## learning/tour-of-beam/learning-content/windowing/motivating-challenge/description.md: ########## @@ -0,0 +1,25 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +### Common Transforms motivating challenge + +You are provided with a `PCollection` from the array of the taxi order price in a csv file. Your task is to launch a new window every 5 seconds that covers data in 10 seconds. And derive the maximum price from them. + +Here is a small list of fields and an example record from this dataset: Review Comment: Please move the description of data in csv to be right after 'You are provided with a `PCollection` from the array of the taxi order price in a csv file' ########## learning/tour-of-beam/learning-content/windowing/session-window/description.md: ########## @@ -0,0 +1,80 @@ +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +### Session windows + +A session window function defines windows that contain elements that are within a certain gap duration of another element. Session windowing applies on a per-key basis and is useful for data that is irregularly distributed with respect to time. For example, a data stream representing user mouse activity may have long periods of idle time interspersed with high concentrations of clicks. If data arrives after the minimum specified gap duration time, this initiates the start of a new window. It is useful when you want to group elements that are related to each other based on the time that passed between them, rather than based on a fixed interval of time. Review Comment: A session window function defines windows containing elements within a specific gap duration of another element. Session windowing applies on a per-key basis and helps process irregularly distributed data with respect to time. For example, a data stream representing user mouse activity may have long periods of idle time interspersed with high concentrations of clicks. If data arrives after the minimum specified gap duration time, this initiates the start of a new window. In addition, it is useful when you want to group related elements based on the time that passed between them rather than on a fixed interval of time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
