[
https://issues.apache.org/jira/browse/GEODE-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927124#comment-16927124
]
ASF subversion and git services commented on GEODE-6973:
--------------------------------------------------------
Commit 18ccf564e1bb34dba891c75681474f572a35fcbf in geode's branch
refs/heads/feature/GEODE-6973 from zhouxh
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=18ccf56 ]
GEODE-6973: Use cachelistener to synchronize typeToId with IdToType
Co-authored-by: Xiaojian Zhou <[email protected]>
Co-authored-by: Donal Evans <[email protected]>
> getExistingIdForType should not compare all entries in idToType region
> ----------------------------------------------------------------------
>
> Key: GEODE-6973
> URL: https://issues.apache.org/jira/browse/GEODE-6973
> Project: Geode
> Issue Type: Bug
> Reporter: xiaojian zhou
> Assignee: xiaojian zhou
> Priority: Major
> Labels: GeodeCommons
> Fix For: 1.10.0
>
> Time Spent: 7h 40m
> Remaining Estimate: 0h
>
> We found the PeerTypeRegistration's getExistingIdForType() will iterate
> through the idToType region's entries to find if the incoming newType is
> there.
> If idToType region contains 20K or 100K entries, this will impact the put
> throughput (customers did notice the performance downgrade when there're many
> pdxTypes).
> To make the things worse, the comparison is to compare the whole object,
> field to field. If the json object (which will be converted to pdxType)
> contains 30 fields, the comparison will have to compare up to 30 fields. If
> the idToType region contains 20K entries, A new pdxType will do 20K x 30
> string comparisons before register it.
> We found each server maintained a typeToId map, this map is used to check if
> the pdxType exists. If exists, it will return the type id without check the
> IdToType region. The total number of pdxType did not impact the put
> performance if the pdxTypd exists.
> The typeToId map is maintained with a d-lock, each time we added a new
> pdxType, it will update into the map while still holding the d-lok. So we
> believe that the map should be the same as the region in content. If we
> cannot find the pdxType in the map, it should not be in the region. We can
> skip the iteration of region (which is the root cause of the performance
> issue).
> Another issue in current code is: when each time a new type come, it will
> recreate the map. This is unnecessary and contributes to the slowness too.
> We should only create the map during initialize().
> Here are the tests we want to introduce:
> 1) a junit test to prove that reorder fields in a big JSON file will not
> cause significant hashcode conflicts (<1%)
> 2) a junit test to prove that add a index to a field in a big JSON file will
> hardly cause hashcode conflicts.
> This 2 tests are to prove that hashcode conflict is not the root cause of
> linear probing for PDXTypeId.
> 3) a junit test to prove that for the cases that hashcode conflict caused by
> reordered fields, there will be no hashcode conflicts if using
> SORT_JSON_FIELD_NAMES_PROPERTY=true.
> 4) a dunit test to prove that SORT_JSON_FIELD_NAMES_PROPERTY=true or false
> did not impact the performance to add a new pdxType.
> 5) a dunit test to create a new pdxType from 2 peer server at the same time.
> The test is to prove that the d-lock take effect, one server create the
> pdxType, and another server should find the pdxType exists.
> Do this test both from server directly and from clients.
> 6) Create 2 different objects which ends up with the same hashcode (we can
> get the 2 objects from test-1), try to put the 2 objects to create new
> pdxType. The 2nd one should also create a new type. It should not be treated
> as "found an existing pdxType".
--
This message was sent by Atlassian Jira
(v8.3.2#803003)