[ 
https://issues.apache.org/jira/browse/GEODE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919949#comment-16919949
 ] 

Jacob S. Barrett edited comment on GEODE-2793 at 8/30/19 11:04 PM:
-------------------------------------------------------------------

The biggest overhead we see in profiling queries is the repeated hash code 
calculations on {{PdxInstanceImpl.hashcode()}}. In the current query benchmark 
it accounts for ~60% of the CPU time and a significant amount of transient 
object allocations. 

We really should investigate a way of stashing the hashcode in the PDX byte 
stream. Care must be taken to avoid upgrade and backwards compatibility issues. 
Also, if the PDX entry is updated in place it must have the hashcode 
recalculated.

Measure using PartitionedIndexQueryBenchmark.

Alternatively, a single PdxInstanceImpl object could be stored in the entry 
either along side the PDX stream or in place of the PDX stream. The instance 
already caches the hashcode after calculating from the stream. We could then 
also optimize {{PdxInstanceImpl.equals(Object)}} by caching equality field 
values locally, with weak references. 



was (Author: jbarrett):
The biggest overhead we see in profiling queries is the repeated hash code 
calculations on {{PdxInstanceImpl.hashcode()}}. In the current query benchmark 
it accounts for ~60% of the CPU time and a significant amount of transient 
object allocations. 

We really should investigate a way of stashing the hashcode in the PDX byte 
stream. Care must be taken to avoid upgrade and backwards compatibility issues. 
Also, if the PDX entry is updated in place it must have the hashcode 
recalculated.

Alternatively, a single PdxInstanceImpl object could be stored in the entry 
either along side the PDX stream or in place of the PDX stream. The instance 
already caches the hashcode after calculating from the stream. We could then 
also optimize {{PdxInstanceImpl.equals(Object)}} by caching equality field 
values locally, with weak references. 


> Look into reducing the amount of PDX deserializations in OQL query 
> intermediate result sets for indexed OR queries containing PdxInstanceImpls
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-2793
>                 URL: https://issues.apache.org/jira/browse/GEODE-2793
>             Project: Geode
>          Issue Type: Bug
>          Components: querying
>            Reporter: Barry Oglesby
>            Assignee: Jacob S. Barrett
>            Priority: Major
>              Labels: perfomance
>
> Intermediate result sets for each of the indexed OR clauses are represented 
> by ResultsBags. Each index is sorted and iterated in 
> AbstractGroupOrRangeJunction auxFilterEvaluate. When entry in the index is 
> added to a ResultsBag, hashCode is invoked. In the case of a PdxInstanceImpl, 
> this causes all of its identity fields to be deserialized so that hashCode 
> can be invoked on them.
> Then, when each ResultsBag is sorted during QueryUtils union and 
> sizeSortedUnion by invoking occurrences on each entry, equals is invoked each 
> entry. In the case of a PdxInstanceImpl, this causes all of its identity 
> fields to be deserialized so that equals can be invoked on them.
> Here is an example query that shows the PDX deserializations:
> {noformat}
> select * from /region this where ((map['entry1']='value1' OR 
> map['entry2']='value2' OR map['entry3']='value3' OR map['entry4']='value4' OR 
> map['entry5']='value5' OR map['entry6']='value6' OR map['entry7']='value7' OR 
> map['entry8']='value8' OR map['entry9']='value9' OR 
> map['entry10']='value10')) ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to