kennknowles commented on issue #18775:
URL: https://github.com/apache/beam/issues/18775#issuecomment-2487130686

   Just adding some commentary here to clarify the scope.
   
   There are two parts:
    - Generic support for analytic functions (which SQL also calls "windowing" 
but it is not related to Beam windowing). I don't remember if we have that. In 
general it will be a stateful `DoFn` that is partitioned according to the 
`PARTITION BY` clause and is otherwise not generally parallelizable (the whole 
point of window functions is really that they are serial). We can probably 
reference other OSS projects for the general translation.
    - Specific support for `ROW_NUMBER` which is probably very easy. But it is 
probably not very useful unless we have the ability to sort.
   
   Take this example:
   
   ```sql
   SELECT
     ROW_NUMBER() OVER (ORDER BY lastname) + 1000 as participant_label,
     firstname,
     lastname,
     country
   FROM athletes
   WHERE sport = 'Marathon';
   ```
   
   To evaluate this, you need to sort the whole thing by `lastname` then do a 
single-threaded stateful `DoFn` to track the row number. Usually you want to 
have a `PARTITION BY` so that it can be parallel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to