[
https://issues.apache.org/jira/browse/CASSANDRA-12153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368229#comment-15368229
]
Benjamin Lerer edited comment on CASSANDRA-12153 at 7/8/16 7:28 PM:
--------------------------------------------------------------------
I am the one to blame for the {{stream()}} method. My main concern, when I
created it, was just to simplify the code.
If we are really looking for speed, I think that we should have some field
variables for {{hasIN}}, {{hasEq}} ...
It will move the computation at preparation time rather than at execution time
and will perform it only once (if my memory is correct {{hasIN()}} is called
multiple times).
bq. Then remove RestrictionSet stream() to discourage this from being
reintroduced?
There is 2 problems associated to the {{stream()}} method. The creation of the
{{LinkedHashSet}} which is used to remove the {{MultiColumnRestriction}}
duplicates and the Lambda expressions.
The {{LinkedHashSet}} is unfortunatly also created in {{iterator()}} so
removing {{stream()}} will not solve that problem.
I think, we could keep track of the fact that multicolumn restrictions are used
or not and avoid creating the {{LinkedHashSet}} if they are not used.
I have no idea of the cost associated to the use of the lambda.
was (Author: blerer):
I am the one to blame for the {{stream()}} method. My main concern, when I
created it, was just to simplify the code.
If we are really looking for speed, I think that we should have some field
variables for {{hasIN}}, {{hasEq}} ...
It will move the computation at preparation time rather than at execution time
and will perform it only once (if my memory is correct {{hasIN()}} is called
multiple times).
bq. Then remove RestrictionSet stream() to discourage this from being
reintroduced?
There is 2 problems associated to the {{stream()}} method. The creation of the
{{LinkedHashSet}} which is used to remove the duplicates
{{MultiColumnRestrictions}} and the Lambda expressions.
The {{{LinkedHashSet}} is unfortunatly also created in {{iterator()}} so
removing {{stream()} will not solve that problem.
I think, we could keep track of the fact that multicolumn restrictions are used
or not and avoid creating the {{LinkedHashSet}} if they are not used.
I have no idea of the cost associated to the use of the lambda.
> RestrictionSet.hasIN() is slow
> ------------------------------
>
> Key: CASSANDRA-12153
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12153
> Project: Cassandra
> Issue Type: Improvement
> Components: Coordination
> Reporter: Tyler Hobbs
> Assignee: Tyler Hobbs
> Priority: Minor
> Fix For: 3.x
>
>
> While profiling local in-memory reads for CASSANDRA-10993, I noticed that
> {{RestrictionSet.hasIN()}} was responsible for about 1% of the time. It
> looks like it's mostly slow because it creates a new LinkedHashSet (which is
> expensive to init) and uses streams. This can be replaced with a simple for
> loop.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)