[ https://issues.apache.org/jira/browse/IGNITE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553094#comment-15553094 ]
Alexander Paschenko commented on IGNITE-4011: --------------------------------------------- All right, the first version of patch for this issue is being tested on TC, and therefore it's time to describe the design that has ultimately been implemented and showcase the examples of configuration. h2. Preface Here are the main ideas: - Leave the design as simple and clean as possible. - Make all configuration changes optional. The only users that will need to change anything will be those who wish to use new DML features in binary mode, and only for keys without classes. For those who don't care about DML or don't use binary keys, there'll be nothing to worry about. - Make possible the cases where no additional coding will be needed from the user's side. Of course, if there's anyone who wanted to use binary classless keys outside of DML context, they also will benefit from this change. h2. API changes The only configuration/public API related class changed is {{CacheKeyConfiguration}}. It has four fields added: {code:java} /** Key hashing mode. */ private BinaryKeyHashingMode binHashingMode; /** Fields to build binary objects' hash code upon. */ private List<String> binHashCodeFields; /** Class name for hash code resolver to automatically compute hash codes for newly built binary objects. */ private String binHashCodeRslvrClsName; {code} h2. Hashing mode The latter two params are meaningful only depending on the value of the first one, so let's review it first. New enum has been introduced to control binary classless key hashing behavior - namely, {{BinaryKeyHashingMode}}. It's declared as follows - I left javadocs intact so that possible options are clear: {code:java} /** * Mode of generating hash codes for keys created with {@link BinaryObjectBuilder}. */ public enum BinaryKeyHashingMode { /** * Default (also legacy pre 1.8) mode. Use this mode if you use no SQL DML commands - INSERT, UPDATE, DELETE, MERGE, * in other words, if you put data to cache NOT via SQL. * Effect from choosing this mode is identical to omitting mode settings from key configuration at all. */ DEFAULT, /** * Generate hash code based upon serialized representation of binary object fields - namely, byte array constructed * by {@link BinaryObjectBuilder}. Use this mode if you are NOT planning to retrieve data from cache via * ordinary cache methods like {@link IgniteCache#get(Object)}, {@link IgniteCache#getAll(Set)}, etc., or * if you don't have particular classes for keys neither on client nor on server - it's an convenient way * to manipulate and retrieve binary data in cache only via full-scale SQL features * with as little additional configuration overhead as choosing this mode. */ BYTES_HASH, /** * Generate hash code based upon on list of fields declared in {@link BinaryObjectBuilder} * (not in {@link BinaryObject} as hash code has to be computed <b>before</b> {@link BinaryObject} is fully built) - * this mode requires that you set {@link CacheKeyConfiguration#binHashCodeFields} for it to work. */ FIELDS_HASH, /** * Generate hash code arbitrarily based on {@link BinaryObjectBuilder} using specified class implementing * {@link BinaryObjectHashCodeResolver}- this mode requires that you set * {@link CacheKeyConfiguration#binHashCodeRslvrClsName} for it to work. */ CUSTOM; } {code} h2. Hashing modes explained So, there are four options, as it'd been discussed on dev list: - don't change any behavior - hash byte array of fields set in builder - hash particular subset of fields in builder - provide custom logic to hash field values in builder in arbitrary way Dev list had also suggested that we introduce interface {{BinaryObjectHashCodeResolver}}. However, in order to make this interface simple to understand and implement, its usage is limited to the last two options - fields subset hashing and custom hashing (last 2 modes in the above list), while byte array hashing works without using it (as byte array is not a part of binary builder). Let's focus on the latter two. Correct hashing is of little use without correct implementation of {{equals}} - even if we manage to maintain uniqueness of hash codes, we have to have mechanism of comparing objects for equality, or otherwise we won't be able to retrieve from the cache what we've put there. Current implementaion of {{equals}} in {{BinaryObjectExImpl}} is based on contents of the arrays. Therefore, this behavior is unchanged for {{BYTES_HASH}} mode - if byte arrays of obejcts are equal, then their portions that correspond to fields are the same as well. As mentioned above, {{FIELDS_HASH}} and {{CUSTOM}} modes utilize {{BinaryObjectHashCodeResolver}} for hashing and equality comparison. h2. Resolver interface and implementation This interface looks as follows: {code:java} package org.apache.ignite.binary; import org.apache.ignite.internal.binary.BinaryObjectExImpl; /** * Method to compute hash codes for new binary objects. */ public interface BinaryObjectHashCodeResolver { /** * @param builder Binary object builder. * @return Hash code value. */ public int hash(BinaryObjectBuilder builder); /** * Compare binary objects for equality in consistence with how hash code is computed. * * @param o1 First object. * @param o2 Second object. * @return */ public boolean equals(BinaryObjectExImpl o1, BinaryObjectExImpl o2); } {code} For {{FIELDS_HASH}}, configuration takes setting list of fields as param of {{CacheKeyConfiguration}} - hash code resolver will be built based upon those. Therefore, this mode takes no additional coding. For {{CUSTOM}}, configuration takes setting list of fields as param of {{CacheKeyConfiguration}}. This mode obliges user to implement {{BinaryObjectHashCodeResolver}} and specify class name for implementation. h2. Per mode configuration examples h3. {{BYTES_HASH}} {code:xml} <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"> <!-- ...other properties... --> <property name="cacheKeyConfiguration"> <list> <bean class="org.apache.ignite.cache.CacheKeyConfiguration"> <property name="typeName" value="bytes_hashed_type" /> <property name="affKeyFieldName" value="someAffField" /> <property name="binHashingMode" value="BYTES_HASH" /> </bean> </list> </property> {code} No coding, no other settings - just set the mode, and you can do all your MERGEs and INSERTs. However, doing {{get}} s will probably be perilous as you'll have to create your keys with builder. This minimalistic configuration suits setups when the user wishes to interact with some portion of data in cache solely via SQL. h3. {{FIELDS_HASH}} {code:xml} <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"> <!-- ...other properties... --> <property name="cacheKeyConfiguration"> <list> <bean class="org.apache.ignite.cache.CacheKeyConfiguration"> <property name="typeName" value="fields_hashed_type" /> <property name="affKeyFieldName" value="someAffField" /> <property name="binHashingMode" value="FIELDS_HASH" /> <property name="binHashCodeFields"> <list> <value>someHashField</value> <value>anotherHashField</value> </list> </property> </bean> </list> </property> {code} Aside from setting the mode, you have to list the fields to hash. Suits modes when client node has classes and data nodes don't, while data gets to cache via SQL INSERT/MERGE. h3. {{CUSTOM}} {code:xml} <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration"> <!-- ...other properties... --> <property name="cacheKeyConfiguration"> <list> <bean class="org.apache.ignite.cache.CacheKeyConfiguration"> <property name="typeName" value="CustomHashedBinaryType" /> <property name="affKeyFieldName" value="someAffField" /> <property name="binHashingMode" value="CUSTOM" /> <property name="binHashCodeRslvrClsName" value="com.company.ignite.binary.SomeCustomHasher" /> </bean> </list> </property> {code} Aside from setting the mode, you have to implement {{BinaryObjectHashCodeResolver}} on specified class. Suits modes when client node has classes and data nodes don't, while data gets to cache via SQL INSERT/MERGE. h2. Existing key classes with {{FIELDS_HASH}} and {{CUSTOM}} hashing modes There is an important aspect of binary object handling: what if we wish to perform a {{get}} on cache that contains a key - for which the class *is* present on client node - and the class *is not* present on data nodes - and key was put to cache not by calling {{put}} but by SQL INSERT or MERGE? What then? In this case user's class already has {{hashCode}} and {{equals}} implemented but we don't have classes on nodes, still {{get}} s obviously have to work. In this case, logic of {{BinaryObjectHashCodeResolver}} should match that declared in key's class (which data nodes don't have). For the cases when {{hashCode}} / {{equals}} logic is trivial and generated by IDE, fields based hashing and equality comparisons are sufficient - therefore, {{FIELDS_HASH}} works, and the only thing to maintain is consistency of field lists in code of key class which data nodes don't have *AND* config files on data nodes. For the cases when {{hashCode}} / {{equals}} logic is not trivial, user will have to implement custom {{BinaryObjectHashCodeResolver}} which will have to mimic the logic of key hashing/comparing in the class. Rationale behind this design is as follows: - If the user does not care about automatic keys hashing (= does not use DML features), then he or she is probably happy and does not want to configure or, God forbid, code anything. All that works has to work without new coding/configuration. - If the user wishes to hash binary classless keys automatically (from SQL INSERT/MERGE) *AND* have key classes on client nodes (= perform {{get}} with key serialized by, say, {{IgniteBinary.toBinary(Object)}} and *NOT* constructed with binary builder), he or she will have to maintain integrity between hashing modes on client and server nodes. However, forcing the user to change the code of existing classes does not seem right, so the only burden is re-configuring data nodes. (And, optionally, writing custom resolver if original class is hashed/compared in some weird way). h2. Any ways to avoid having to do anything at all? Sure thing. - Don't use DML. - Don't use binary keys without classes. *(Everything written above affects only cases with non trivial classless keys.) > Automatically compute hash codes for newly built binary objects > --------------------------------------------------------------- > > Key: IGNITE-4011 > URL: https://issues.apache.org/jira/browse/IGNITE-4011 > Project: Ignite > Issue Type: Task > Components: binary, cache > Reporter: Alexander Paschenko > Assignee: Alexander Paschenko > Fix For: 1.8 > > > For binary keys built automatically inside SQL engine during INSERT or MERGE, > we need to compute hash codes automatically because in this case the user > does not interact with any builders and can't set hash code explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)