[ 
https://issues.apache.org/jira/browse/TINKERPOP-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166227#comment-17166227
 ] 

Norio Akagi commented on TINKERPOP-2393:
----------------------------------------

 


> If I'm not mistaken the only Barrier with a configurable size is NoOpBarrier. 
> If so, it would basically mean that we'd only see a value less than "exhaust 
> the stream" for that Step, is that right?

For now yes. To share some background, I work for some GraphDB provider which 
supports Tinkerpop. The initial motivation for us was to optimize 
PropertiesStep, which just retrieves properties for each element one by one as 
long as we follow the step and api provided.
https://github.com/apache/tinkerpop/blob/cff4c161615f2b50bda27b6ba523c7f52b833532/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/PropertiesStep.java#L64
So we internally replaced this PropertiesStep with our custom 
BulkPropertiesStep and this new Step queues up incoming 
Traverser.Admin<Element> up to certain size and then call our own Storage API 
to retrieve properties for multiple elements at once.
This apparently worked but now we noticed that when BranchStep wraps our 
PropertiesStep, it just gives one solution at a time 
https://github.com/apache/tinkerpop/blob/534746b6019d0775e628375a3dfeba8a18df90fe/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/branch/BranchStep.java#L143

so batch functionality really doesn't work. For now, we are thinking that when 
we replace PropertiesStep with BulkPropertiesStep we can add NoOpBarrierStep 
too with a certain barrier size, so we want BranchStep to check 
NoOpBarrierStep's size and to tweak how many solutions it passes to downstream. 


As long term solution, however, I think BulkPropertiesStep itself should 
implement Barrier for simplicity sake. And further, in my opinion it is better 
if TinkerPop's PropertiesStep officially has and uses this kind of bulk 
functionality. Because we don't think this sort of "batch request to storage 
layer" is an uncommon feature request from GraphDB provider's point of view 
(though it depends on what kind of storage we use), and it doesn't break 
existing behavior and performance if we just switch to the current "per 
element" API call when barrierSize = 1 as default behavior.


With this change, we may want to 
 * add some API like properties(List<Vertex> vertcies, List<String> keys) 
properties(List<Edge> edges, List<String> keys) under some class other than 
Vertex or Edge so that GraphDB provider can extend it.
 * change PropertiesStep to extend a new BaseStep other than current FlatMap, 
the difference is the new step returns Iterator<E> from 
Iterator<Traverser.Admin<S>>.
 * Create some Base class for Barrier which has barrierSize as protected member 
and NoOpBarrierStep and new PropertiesStep should inherit it 
We look for PropertiesStep optimization at this point but this may be 
applicable in other steps (I need to investigate other steps).

This is just my current rough idea, may be beyond the scope of this JIRA ticket 
and need another discussion thread (at least I want to separate a pull request 
to just have BranchStep follow NoOpBarrierStep's barrier size if any) but if 
you have any thoughts on this direction, please let me know.

Thank you for your response !

> BranshStep should respect the barrier size of BarrierStep when calling 
> applyCurrentTraverser
> --------------------------------------------------------------------------------------------
>
>                 Key: TINKERPOP-2393
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2393
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.4.7
>            Reporter: Norio Akagi
>            Priority: Minor
>
> Right now in BranchStep, if children contain a BarrierStep it keeps calling 
> applyCurrentTraverser until this.starts is exhausted.
> [https://github.com/apache/tinkerpop/blob/534746b6019d0775e628375a3dfeba8a18df90fe/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/branch/BranchStep.java#L139-L140]
> This means the child step may hold all upstream solutions in memory at once, 
> and potentially memory usage is bloated.
> I think here we can change so that BarrierStep can have a method like 
> getMaxBarrierSize, and BranchStep examines the size of barrier that child 
> steps can handle at once then break the while loop either when 
> `this.starts.hasNext` becomes false or we exceed the barrier size to save the 
> potential memory consumption.
>  
> When there are multiple barrier steps as children, we may pick either max or 
> min among them.
> I am happy to implement this, but please let me know if you have any comments 
> or concerns on this change. 
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to