Realized I wasn't clear in how the method declaration would be problematic.

example:

> > - stream(Predicate<Resource>, Set<TraversalOption>... )

take the invocation

-  stream(predicate, CHILDREN_ONLY);

since the first predicate is a structural predicate in the manner that it 
determines what child resources to traverse on, it's now not used in the above 
invocation. I don't like methods where one parameter is ignored based on the 
value of the second parameter. I find it can lead to unpredictable outcomes 
because people aren't cognizant of it.

Saying that, there may be Traversal Options that aren't necessarily conflicting 
with the predicate and it may make sense to incorporate it.

My methodology tends to be release early and often and garner feedback so I'm 
not adverse to adding that, I just want to get some wider usage of the feature 
so that I'm making useful choices in it's evolution.


> > I think that the 'hasNext' method is not lazy, eagerly loading the
> > direct children of a resource when there are no more resources to load.
> > 
> > If the resource has a large number of children we can end up taking
> > lots of memory, so I think we need to be more lazy with it, since the
> > streams framework can take advantage of that (IIRC). Is it possible to
> > ajust the implementation this way?
> 

So one thing I discovered is problematic is probably related to the JCR and 
that is the multi threading in the stream. When I ran tests where I made the 
stream capable of being splitted I had worse results then single threading it. 
I didn't dive to much into it but my assumption at this point is that when you 
are requesting multiple resources using the same session on multiple threads, 
the underlying JCR doesn't like it.

I do have an idea to make it potentially more lazy which I'll do on my lunch 
break. Where I replace the list of resources with a list of iterators which 
references the depth of the tree I'm iterating through. I can't really 
guarantee lazy behavior because I would be relying on the underlying resource 
provider to provide a Lazy iterator. Which can't be guaranteed. 



- Jason

On Mon, Jun 4, 2018, at 1:21 PM, Jason E Bailey wrote:
> Thanks Robert!
> 
> - Jason
> 
> On Mon, Jun 4, 2018, at 11:45 AM, Robert Munteanu wrote:
> 
> > There are two questions I want to raise:
> > 
> > 1. API related
> > 
> > We have two methods:
> > 
> > - stream()
> > - stream(Predicate<Resource>)
> > 
> > which traverse the whole tree.
> 
> the base stream() would traverse the whole tree, the branch selector 
> parameter (Predicate<Resource>) is used to limit the traversal. The most 
> common use case I give for this if you have a site with 10,000 pages on 
> it. The number of nodes or resources in that structure could come close 
> to 1 million nodes. If, however, you can find if a page matches your 
> needs from the page node itself . You don't have to descend into the 
> jcr:content sub tree.  So instead of traversing a million nodes, you 
> just traverse 10,000
> 
> As Dan mentioned, you could also just have that branch selector itself 
> limit the depth of the search. 
> 
> 
> > There were questions of whether this is OK or if we maybe need to
> > stream the children only. Could the API additions allow that?
> > 
> > We can start with something like
> > 
> > - stream(Predicate<Resource>, Set<TraversalOption>... )
> > - stream(Predicate<Resource>)
> > - stream(Set<TraversalOption>...)
> 
> If the traversal option defines only the children of the existing 
> resource, then there is no need to define a branch selector.  It's 
> exclusionary. Which, makes the combinations kind of weird,  It would 
> make more sense to just name the method for getting the children as a 
> stream something unique.
> 
> 
> > 2. Implementation related
> > 
> > I think that the 'hasNext' method is not lazy, eagerly loading the
> > direct children of a resource when there are no more resources to load.
> > 
> > If the resource has a large number of children we can end up taking
> > lots of memory, so I think we need to be more lazy with it, since the
> > streams framework can take advantage of that (IIRC). Is it possible to
> > ajust the implementation this way?
> 
> Maybe. You would still need to have the hasNext() method correctly 
> identify whether there is a next or not, so you would need to have a 
> loop iterate through and checking.  I would have to somehow find my way 
> back to the correct location the next time the method is called. My 
> biggest concern there is with the unordered children. Do I need to 
> maintain state of the underling iteration that I'm checking?
> 
> I'd ask that we go ahead and get it in as is, and I could pursue being 
> more lazy on that call. I don't know how long that would take me and I'm 
> still waiting for the gods of the internet to provide me a job where I 
> can work on Sling as my job full time.
> 
>  
> > Thanks,
> > 
> > Robert

Reply via email to