This is an automated email from the ASF dual-hosted git repository.

paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git

commit 71915c65de2396caa9e0c5949a99a86ce19e36b6
Author: Paul King <[email protected]>
AuthorDate: Sat Dec 9 12:14:52 2023 +1000

    initial draft
---
 site/src/site/blog/groovy-gatherers.adoc | 269 +++++++++++++++++++++++++++++++
 1 file changed, 269 insertions(+)

diff --git a/site/src/site/blog/groovy-gatherers.adoc 
b/site/src/site/blog/groovy-gatherers.adoc
new file mode 100644
index 0000000..819cc0c
--- /dev/null
+++ b/site/src/site/blog/groovy-gatherers.adoc
@@ -0,0 +1,269 @@
+= Using Gatherers with Groovy
+Paul King
+:revdate: 2023-12-09T11:30:00+00:00
+:draft: true
+:keywords: gatherers, jdk22, chop, collate, ginq, jep461
+:description: This post looks at using Gatherers (JEP 461) with Groovy.
+
+An interesting feature being previewed in JDK22 is _Gatherers_
+(https://openjdk.java.net/jeps/461[JEP 461]).
+This blog looks at using that feature with Groovy.
+The examples in this blog were tested with Groovy 4.0.16 using JDK version 
22-ea+27-2262.
+As this JDK version is still in early access,
+you should read the disclaimers to understand that this JDK feature
+is subject to change before final release. If and when the feature becomes
+final, Groovy supports it without needing any additional support.
+
+== Understanding Gatherers
+
+Java developers are by now very familiar with streams.
+A stream is a potentially unbounded sequence of values supporting lazy 
computation.
+Processing streams is done via a stream pipeline which consists of three parts:
+a source of elements, zero or more intermediate operations (like `filter` and 
`map`),
+and a terminal operation.
+
+This framework is very powerful and efficient and offers some extensibility
+via a customisable terminal operation. The available intermediate operations
+is fixed in size, and while the built-in ones are very useful,
+some complex tasks cannot easily be expressed as stream pipelines.
+Enter _gatherers_. Gatherers provide the ability to customize intermediate 
operations.
+
+The stream API is updated to support a `gather` intermediate operation which 
takes a gatherer
+and returns a transformed stream. Let's dive into a few more details of 
gatherers.
+
+A gatherer has four functions:
+
+* The optional _initializer_ is just a `Supplier` which returns some (initial) 
state.
+
+* The _integrator_ is typically the most important part. It satisfies the 
following interface:
++
+[source,java]
+----
+interface Integrator<A, T, R> {
+    boolean integrate(A state, T element, Downstream<? super R> downstream);
+}
+----
++
+where `state` is some state -- we'll use a list as state in a few of the 
upcoming
+examples, but it could just as easily be an instance of a class or record, 
`element`
+is the next element in the current stream to be processed, and `downstream` is
+a hook for creating the elements that will be processed in the next stage of 
the stream pipeline.
+
+* The optional `finisher` has access to the state and downstream pipeline hook.
+It performs any last step actions which might be needed.
+
+* The optional _combiner_ is used to evaluate the gatherer in parallel when 
processing an input stream in parallel. The examples we'll look at in this blog 
post are inherently ordered in nature
+and thus cannot be parallelized, so we won't discuss this aspect further here.
+
+Over and above, the Gatherer API, there are a number of built-in gathers
+like `windowFixed` and `windowSliding`, among others.
+
+== Accessing parts of a collection
+
+Groovy provides very flexible indexing variants to
+select specific elements from a collection:
+
+[source,groovy]
+----
+assert (1..8)[0..2] == [1, 2, 3]                   // index by closed range
+assert (1..8)[3<..<6] == [5, 6]                    // index by open range
+assert (1..8)[0..2,3..4,5] == [1, 2, 3, 4, 5, 6]   // index by multiple ranges
+assert (1..8)[0..2,3..-1] == 1..8                  // ditto
+assert (1..8)[0,2,4,6] == [1,3,5,7]                // select odd numbers
+assert (1..8)[1,3,5,7] == [2,4,6,8]                // select even numbers
+----
+
+You can also pick out a window of elements using `take` and `drop`:
+
+[source,groovy]
+----
+assert (1..8).take(3) == [1, 2, 3]                 // same as [0..2]
+assert (1..8).drop(2).take(3) == [3, 4, 5]         // same as [2..4]
+----
+
+Stream users might do the same thing using `skip` and `limit`:
+
+[source,groovy]
+----
+assert (1..8).stream().limit(3).toList() == [1, 2, 3]
+assert (1..8).stream().skip(2).limit(3).toList() == [3, 4, 5]
+----
+
+But what about some of Groovy's more elaborate mechanisms for manipulating 
collections?
+I'm glad you asked. Let's look at `collate` and `chop`.
+
+== Collate
+
+Groovy's `collate` method splits a collection into fixed size chunks:
+
+[source,groovy]
+----
+assert (1..8).collate(3) == [[1, 2, 3], [4, 5, 6], [7, 8]]
+----
+
+The last chunk in this example is smaller than the chunk size.
+It contains the remaining elements left over after all full size chunks
+have been created. If we don't want the leftover chunk,
+we can ask for it to be excluded using an optional boolean parameter:
+
+[source,groovy]
+----
+assert (1..8).collate(3, false) == [[1, 2, 3], [4, 5, 6]]
+----
+
+Such functionality isn't really possible with streams unless you wanted to
+process the stream multiple times, or you shoved all the logic in the
+collector, but then you'd be giving up some of the key benefits of streams.
+Luckily, with gatherers, we can now obtain this functionality.
+
+The first cases is so common, there is a built-in gatherer 
(`Gatherers#windowFixed`) for it:
+
+[source,groovy]
+----
+assert (1..8).stream().gather(windowFixed(3)).toList() ==
+    [[1, 2, 3], [4, 5, 6], [7, 8]]
+----
+
+There is no exact equivalent to handle the less common case of discarding
+the leftover elements, but it's easy enough to write our own:
+
+[source,groovy]
+----
+<T> Gatherer<T, ?, List<T>> windowFixedTruncating(int windowSize) {
+    Gatherer.ofSequential(
+        () -> [],
+        Gatherer.Integrator.ofGreedy { window, element, downstream ->
+            window << element
+            if (window.size() < windowSize) return true
+            var result = List.copyOf(window)
+            window.clear()
+            downstream.push(result)
+        }
+    )
+}
+----
+
+We have an initializer which just returns an empty list as our initial state.
+The gatherer keeps adding elements to the state (our list or window). Once the
+list is filled to the window size, we'll output it to the downstream,
+and then clear the list ready for the next window.
+The code here is essentially a simplified version of `windowFixed`, we can
+just leave out the finalizer that `windowFixed` would require to potentially
+output the partially-filled window at the end.
+
+A few details. Our operation is sequential since it is inherently ordered,
+hence we used `ofSequential` to mark it so. We will also always process all
+elements, so we create a greedy gatherer using `ofGreedy`. While not strictly
+necessary, this allows for optimisation of the pipeline.
+
+We'd use it like this:
+
+[source,groovy]
+----
+assert (1..8).stream().gather(windowFixedTruncating(3)).toList() ==
+    [[1, 2, 3], [4, 5, 6]]
+----
+
+The default when using `collate` is to start the next chunk/window
+at the element directly after the previous one, but there are overloads
+which also take a step size. This is used to calculate the index at which
+the second (and subsequent) window(s) will start.
+There is an optional `keepRemaining` boolean
+to handle the leftover case as well.
+If we want to slide along by 1 and discard leftovers, we'd use:
+
+[source,groovy]
+----
+assert (1..5).collate(3, 1, false) == [[1, 2, 3], [2, 3, 4], [3, 4, 5]]
+----
+
+This aligns with the built-in `windowSliding` gatherer:
+
+[source,groovy]
+----
+assert (1..5).stream().gather(windowSliding(3)).toList() ==
+    [[1, 2, 3], [2, 3, 4], [3, 4, 5]]
+----
+
+If we want the step size to be other than 1, or we want control over
+the leftovers, there is no built-in gatherer option,
+but we can again write one ourselves. Let's consider some examples.
+We'll look at a gatherer implementation shortly, but first Groovy's
+collection variants:
+
+[source,groovy]
+----
+assert (1..5).collate(3, 1) == [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5], [5]]
+assert (1..8).collate(3, 2) == [[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8]]
+assert (1..8).collate(3, 2, false) == [[1, 2, 3], [3, 4, 5], [5, 6, 7]]
+assert (1..8).collate(3, 4, false) == [[1, 2, 3], [5, 6, 7]]
+assert (1..8).collate(3, 3) == [[1, 2, 3], [4, 5, 6], [7, 8]]  // same as 
collate(3)
+----
+
+Now let's write our gatherer:
+
+[source,groovy]
+----
+<T> Gatherer<T, ?, List<T>> windowSlidingByStep(int windowSize, int stepSize, 
boolean keepRemaining = true) {
+    int skip = 0
+    Gatherer.ofSequential(
+        () -> [],                                                        // 
initializer
+        Gatherer.Integrator.ofGreedy { window, element, downstream ->    // 
integrator
+            if (skip) {
+                skip--
+                return true
+            }
+            window << element
+            if (window.size() < windowSize) return true
+            var result = List.copyOf(window)
+            skip = stepSize > windowSize ? stepSize - windowSize : 0
+            [stepSize, windowSize].min().times { window.removeFirst() }
+            downstream.push(result)
+        },
+        (window, downstream) -> {                                        // 
finalizer
+            if (keepRemaining) {
+                while(window.size() > stepSize) {
+                    downstream.push(List.copyOf(window))
+                    stepSize.times{ window.removeFirst() }
+                }
+                downstream.push(List.copyOf(window))
+            }
+        }
+    )
+}
+----
+
+Some points. Our gatherer is still sequential for the same reasons as 
previously.
+We are still processing every element, so we again created a greedy gatherer.
+We have a little bit of optimization baked into the code. If our step size
+is bigger than the window size, we can do no further processing in our gatherer
+for the elements in between our windows. We could simplify the code and store 
those
+elements only to throw them away later, but it's not too much effort to make
+the algorithm as efficient as possible. We also need a finalizer here which
+handles the leftover chunk(s).
+
+And we'd use it like this:
+[source,groovy]
+----
+assert (1..5).stream().gather(windowSlidingByStep(3, 1)).toList() ==
+    [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5], [5]]
+assert (1..8).stream().gather(windowSlidingByStep(3, 2)).toList() ==
+    [[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8]]
+assert (1..8).stream().gather(windowSlidingByStep(3, 2, false)).toList() ==
+    [[1, 2, 3], [3, 4, 5], [5, 6, 7]]
+assert (1..8).stream().gather(windowSlidingByStep(3, 4, false)).toList() ==
+    [[1, 2, 3], [5, 6, 7]]
+assert (1..8).stream().gather(windowSlidingByStep(3, 3)).toList() ==
+    [[1, 2, 3], [4, 5, 6], [7, 8]]
+----
+
+== Chop
+
+== Testing for a subsequence
+
+== Conclusion
+
+We have had a quick glimpse at using gatherers with Groovy.
+We are still in the early days of gatherers being available,
+so expect much more to emerge as this feature becomes more mainstream.
+We look forward to it advancing past preview status.

Reply via email to