Well, it's not really within Jena's purview to redesign the Java APIs or to
offer a parallel processing framework. {grin}
Jena's own oaj.atlas.iterator.Iter<T> offers much of Guava's convenience and
more, with both static and instance patterns.
---
A. Soroka
The University of Virginia Library
> On Jul 27, 2016, at 2:23 PM, Paul Houle <[email protected]> wrote:
>
> I am referring to the kind of statics that Guava has on classes like:
>
> http://google.github.io/guava/releases/19.0/api/docs/com/google/common/collect/Iterators.html
>
> as well as a family of similar statics. My opinionated opinion is that:
> (i) JDK8 lambdas are great, (ii) Guava has some good parts, and (iii)
> JDK8 streams are a bad API.
>
> The deep complaint about JDK8 streams is that the approach to
> parallelism is completely wrong-headed. Fork-Join is one of the many
> parallel programming paradigms (ex. actors, STM) for people who find a
> 10 line program that takes 10 minutes to write with an Executor and that
> gets a 7x speedup on an 8 core machine and gives the same answer every
> time you run is for applications programmers and that *real* programmers
> have to work harder than that.
>
> To parallelize "those" sorts of workloads you need to decide: (i) the
> size of the microbatch to use and (ii) how many threads to use.
> Microbatching is essential because you can do an awful lot of things to
> a Statement in the time it takes to cross a thread boundary. It is easy
> to hand tune these parameters and you tend to get decent results even if
> you are far off from the optimal numbers. It doesn't seem so easy for
> Fork-Join and similar things to do the same automatically.
>
> https://projectreactor.io/
>
> is getting somewhere close to what a minimal viable product looks like
> although their idea of "streams" is different. I don't find it hard at
> all to get extra throughput in that framework by doing things like
> "gunzip", "parse XML", "create little Jena Models from the XML",
> "insert little Jena Models into a TDB", "index some fields from the
> little Jena models in Lucene" in separate threads, serializing all the
> TDB writes in one thread. (To be fair the code is still harder to write
> than it should be and the reason I've been doing so much i dotting, t
> crossing and trash talking lately is to avoid hunkering down and writing
> a compiler to fix that problem)
>
> For tough stuff, therefore, JDK 8 Streams are doomed and then you are
> left with it being an awkward framework that doesn't even make up it's
> mind entirely if a Stream is essentially an Iterator or an Iterable,
> that is just plain painful to write, plays badly with type inference in
> the compiler so error messages don't make a lot of sense and
> autocomplete doesn't work well in your IDE. It is a little more capable
> than the "guava statics" but the amount of suffering it makes you go
> through is not worth it.
>
> --
> Paul Houle
> [email protected]
>
> On Wed, Jul 27, 2016, at 01:21 PM, A. Soroka wrote:
>> In this remark are you referring to mapping through functions, filtering
>> and reducing and collecting and that sort of thing? Because we've had
>> some conversation in the past about beginning to support Streams as part
>> of the API, and Stream < AutoCloseable. Maybe that's the direction in
>> which to go.
>>
>> ---
>> A. Soroka
>> The University of Virginia Library
>>
>>> On Jul 27, 2016, at 12:46 PM, Paul Houle <[email protected]> wrote:
>>>
>>> Pretty recently I went through a bunch of static methods I wrote that used
>>> Guavaisms on StmtIterators and rewrote them to respect the ClosableIterator
>>> and found I didn't need any Guavaisms because NiceIterator did almost
>>> everything I needed.
>>