[
https://issues.apache.org/jira/browse/TINKERPOP-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18068121#comment-18068121
]
Cole Greer commented on TINKERPOP-3231:
---------------------------------------
I've done a deeper dive into this issue and run some benchmarks and I believe
it should be closed.
The discussed cloning bypass in `getValue()` has already been implemented in
https://github.com/apache/tinkerpop/commit/b2816c5669970adbcd451493f057078ff58f17d5.
I've run a benchmark for `getValue()` with a predicate containing a list of 1M
doubles, where the latency drops from about 3ms in 3.8.0, to ~3E-6 ms in
3.8.1-SNAPSHOT, which is inline with 3.7 performance as well. `getValue()`
performance remains relatively unchanged between 3.8.0 and 3.8.1-SNAPSHOT in
cases where GValue's are mixed into the collection.
If we need to further optimize the GValue paths in `getValue()`, I believe a
caching approach would offer the greatest improvement such that we don't need
to recompute the merged list with each iteration. This cache would just need to
be cleared on any call to `setValue()` or `updateVariable()`. As this JIRA was
motivated by the no-GValue case performance, I believe this should be left
as-is for now, and the caching only considered if demand arises. I believe it's
best not to introduce extra complexity without clear demand.
I've run some experiments with the proposed updated to `setValue()` to get rid
of the initial GValue scan in the collection, however such a change made the
performance substantially worse in the "no-GValue" case while offering little
benefit in the mixed case. The worst-case performance of `setValue()` is
dominated by the forced one element at a time copying from the provided value
collection to the internal `literals` and `variables` collections. The initial
GValue scan is relatively inconsequential by comparison, and is worth
containing to retain the no-GValue fast bypass.
> Optimize P.setValue() and P.getValue()
> --------------------------------------
>
> Key: TINKERPOP-3231
> URL: https://issues.apache.org/jira/browse/TINKERPOP-3231
> Project: TinkerPop
> Issue Type: Improvement
> Components: process
> Affects Versions: 3.8.0
> Reporter: Cole Greer
> Priority: Blocker
>
> First raised by Pieter Martin in discord:
> https://discord.com/channels/838910279550238720/893235809690464277/1470053903742537981
> The updated getValue and setValue methods in 3.8.0 result in excessive
> processing of collections for within/without predicates. This can be
> especially harmful with large collections (millions of literals). Some of
> this processing is necessary due to the needed separation between literals
> and GValues, however some of it can be improved.
> In the case of setValue(), it currently makes 2 complete passes through a
> collection, the first to check if GValues are present, and the second to
> split the list into literals and GValues. The intention of this 2 pass
> approach was to bypass the type sorting in the "no GValue" case, and retain
> the original collection type. This is likely unnecessary overhead which is
> unlikely to provide justifiable value. The sorting is necessary but can be
> achieved without this 2 pass approach.
> For getValue(), it is necessary to re-unify the literals and variables
> collections to retain the public interface of P. In the case where
> `this.variables` is empty, we could likely bypass the cloning step and
> directly return `this.literals`. There may be a concern with the method
> returning a clone of the collection in some circumstances, and the original
> collection in others. This warrants additional consideration, but perhaps is
> solvable by wrapping the result in `Collections.unmodifiableCollection()`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)