Well, it's not really within Jena's purview to redesign the Java APIs or to 
offer a parallel processing framework. {grin}

Jena's own oaj.atlas.iterator.Iter<T> offers much of Guava's convenience and 
more, with both static and instance patterns.

---
A. Soroka
The University of Virginia Library

> On Jul 27, 2016, at 2:23 PM, Paul Houle <[email protected]> wrote:
> 
> I am referring to the kind of statics that Guava has on classes like:
> 
> http://google.github.io/guava/releases/19.0/api/docs/com/google/common/collect/Iterators.html
> 
> as well as a family of similar statics.  My opinionated opinion is that:
> (i) JDK8 lambdas are great,  (ii) Guava has some good parts,  and (iii)
> JDK8 streams are a bad API.
> 
> The deep complaint about JDK8 streams is that the approach to
> parallelism is completely wrong-headed.  Fork-Join is one of the many
> parallel programming paradigms (ex. actors,  STM) for people who find a
> 10 line program that takes 10 minutes to write with an Executor and that
> gets a 7x speedup on an 8 core machine and gives the same answer every
> time you run is for applications programmers and that *real* programmers
> have to work harder than that.
> 
> To parallelize "those" sorts of workloads you need to decide:  (i) the
> size of the microbatch to use and (ii) how many threads to use. 
> Microbatching is essential because you can do an awful lot of things to
> a Statement in the time it takes to cross a thread boundary.  It is easy
> to hand tune these parameters and you tend to get decent results even if
> you are far off from the optimal numbers.  It doesn't seem so easy for
> Fork-Join and similar things to do the same automatically.
> 
> https://projectreactor.io/
> 
> is getting somewhere close to what a minimal viable product looks like
> although their idea of "streams" is different.  I don't find it hard at
> all to get extra throughput in that framework by doing things like
> "gunzip",  "parse XML", "create little Jena Models from the XML",
> "insert little Jena Models into a TDB",  "index some fields from the
> little Jena models in Lucene" in separate threads,  serializing all the
> TDB writes in one thread.  (To be fair the code is still harder to write
> than it should be and the reason I've been doing so much i dotting,  t
> crossing and trash talking lately is to avoid hunkering down and writing
> a compiler to fix that problem)
> 
> For tough stuff,  therefore,  JDK 8 Streams are doomed and then you are
> left with it being an awkward framework that doesn't even make up it's
> mind entirely if a Stream is essentially an Iterator or an Iterable, 
> that is just plain painful to write,  plays badly with type inference in
> the compiler so error messages don't make a lot of sense and
> autocomplete doesn't work well in your IDE.  It is a little more capable
> than the "guava statics" but the amount of suffering it makes you go
> through is not worth it.
> 
> -- 
>  Paul Houle
>  [email protected]
> 
> On Wed, Jul 27, 2016, at 01:21 PM, A. Soroka wrote:
>> In this remark are you referring to mapping through functions, filtering
>> and reducing and collecting and that sort of thing? Because we've had
>> some conversation in the past about beginning to support Streams as part
>> of the API, and Stream < AutoCloseable. Maybe that's the direction in
>> which to go.
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> On Jul 27, 2016, at 12:46 PM, Paul Houle <[email protected]> wrote:
>>> 
>>> Pretty recently I went through a bunch of static methods I wrote that used 
>>> Guavaisms on StmtIterators and rewrote them to respect the ClosableIterator 
>>> and found I didn't need any Guavaisms because NiceIterator did almost 
>>> everything I needed.
>> 

Reply via email to