[
https://issues.apache.org/jira/browse/TINKERPOP-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166227#comment-17166227
]
Norio Akagi commented on TINKERPOP-2393:
----------------------------------------
> If I'm not mistaken the only Barrier with a configurable size is NoOpBarrier.
> If so, it would basically mean that we'd only see a value less than "exhaust
> the stream" for that Step, is that right?
For now yes. To share some background, I work for some GraphDB provider which
supports Tinkerpop. The initial motivation for us was to optimize
PropertiesStep, which just retrieves properties for each element one by one as
long as we follow the step and api provided.
https://github.com/apache/tinkerpop/blob/cff4c161615f2b50bda27b6ba523c7f52b833532/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/PropertiesStep.java#L64
So we internally replaced this PropertiesStep with our custom
BulkPropertiesStep and this new Step queues up incoming
Traverser.Admin<Element> up to certain size and then call our own Storage API
to retrieve properties for multiple elements at once.
This apparently worked but now we noticed that when BranchStep wraps our
PropertiesStep, it just gives one solution at a time
https://github.com/apache/tinkerpop/blob/534746b6019d0775e628375a3dfeba8a18df90fe/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/branch/BranchStep.java#L143
so batch functionality really doesn't work. For now, we are thinking that when
we replace PropertiesStep with BulkPropertiesStep we can add NoOpBarrierStep
too with a certain barrier size, so we want BranchStep to check
NoOpBarrierStep's size and to tweak how many solutions it passes to downstream.
As long term solution, however, I think BulkPropertiesStep itself should
implement Barrier for simplicity sake. And further, in my opinion it is better
if TinkerPop's PropertiesStep officially has and uses this kind of bulk
functionality. Because we don't think this sort of "batch request to storage
layer" is an uncommon feature request from GraphDB provider's point of view
(though it depends on what kind of storage we use), and it doesn't break
existing behavior and performance if we just switch to the current "per
element" API call when barrierSize = 1 as default behavior.
With this change, we may want to
* add some API like properties(List<Vertex> vertcies, List<String> keys)
properties(List<Edge> edges, List<String> keys) under some class other than
Vertex or Edge so that GraphDB provider can extend it.
* change PropertiesStep to extend a new BaseStep other than current FlatMap,
the difference is the new step returns Iterator<E> from
Iterator<Traverser.Admin<S>>.
* Create some Base class for Barrier which has barrierSize as protected member
and NoOpBarrierStep and new PropertiesStep should inherit it
We look for PropertiesStep optimization at this point but this may be
applicable in other steps (I need to investigate other steps).
This is just my current rough idea, may be beyond the scope of this JIRA ticket
and need another discussion thread (at least I want to separate a pull request
to just have BranchStep follow NoOpBarrierStep's barrier size if any) but if
you have any thoughts on this direction, please let me know.
Thank you for your response !
> BranshStep should respect the barrier size of BarrierStep when calling
> applyCurrentTraverser
> --------------------------------------------------------------------------------------------
>
> Key: TINKERPOP-2393
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2393
> Project: TinkerPop
> Issue Type: Improvement
> Components: process
> Affects Versions: 3.4.7
> Reporter: Norio Akagi
> Priority: Minor
>
> Right now in BranchStep, if children contain a BarrierStep it keeps calling
> applyCurrentTraverser until this.starts is exhausted.
> [https://github.com/apache/tinkerpop/blob/534746b6019d0775e628375a3dfeba8a18df90fe/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/branch/BranchStep.java#L139-L140]
> This means the child step may hold all upstream solutions in memory at once,
> and potentially memory usage is bloated.
> I think here we can change so that BarrierStep can have a method like
> getMaxBarrierSize, and BranchStep examines the size of barrier that child
> steps can handle at once then break the while loop either when
> `this.starts.hasNext` becomes false or we exceed the barrier size to save the
> potential memory consumption.
>
> When there are multiple barrier steps as children, we may pick either max or
> min among them.
> I am happy to implement this, but please let me know if you have any comments
> or concerns on this change.
> Thanks
--
This message was sent by Atlassian Jira
(v8.3.4#803005)