[
https://issues.apache.org/jira/browse/IGNITE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14910129#comment-14910129
]
Vladimir Ozerov commented on IGNITE-1549:
-----------------------------------------
Implementation plan:
1) Switch field type ID and field length;
2) Implement field length infer when possible;
3) Implement special types for constants;
4) Implement var-length compression - need to evaluate whether we will benefit
form it or not.
> Optimize portable object fields write in non-raw mode.
> ------------------------------------------------------
>
> Key: IGNITE-1549
> URL: https://issues.apache.org/jira/browse/IGNITE-1549
> Project: Ignite
> Issue Type: Task
> Components: general
> Affects Versions: 1.1.4
> Reporter: Vladimir Ozerov
> Assignee: Vladimir Ozerov
> Priority: Blocker
> Fix For: ignite-1.5
>
>
> Currently we write user fields as follows:
> 0 ,, 3 - field ID;
> 4 - field type;
> 5 ..8 - field len;
> 9 .. - the field itself.
> It can be optimized as follows:
> 1) Field len usually can be inferred from type. E.g., for int it is 4.
> 2) Frequently used constants can be written as separate types. E.g. INT -
> normal int, INT_0 - zero, etc.
> 3) Last, but not least, values should be encoded using "variable bytes" (and
> possibly ZigZag) algorithm. This will give us 2 bytes economy for ints and
> longs on average (I assume here that longs are usually bigger than 4 bytes,
> e.g. timestamps).
> *New types will be introduced:*
> 1) Booleans: BOOL_FALSE, BOOL_TRUE;
> 2) Bytes: BYTE_C0 => zero, BYTE_C1 => 1, BYTE_C1N => -1;
> 3) Shorts, chars: SHORT_C0, SHORT_C1, SHORT_C1N;
> 4) Ints: INT_C0, INT_C1, INT_C1N, INT_1 - int which fits into 1 byte, INT_1N
> - same for negative value, INT_2, INT_2N, INT_3, INT_3N, INT_3, INT_3N,
> INT_4, INT_4N.
> 5) Longs: same as ints, but have only 2, 4, 6 and 8 byte count discriminators
> to avoid excessive calculations.
> It means that instead of 6 integer types previously, we will have 2 + 3 + 3 +
> 3 + 11 + 11 = 32 types.
> To avoid excessive switches or (even worse) array/map lookups to understand
> what the type is, we can divide all types space (256) into two parts:
> optimized and non-optimized. Optimized space will have the MSB set to 1, and
> mentioned ~30 optimized types (or some of them) are located there.
> For floats and doubles we simply infer length.
> For primitive arrays we do not write field length and then arrya length, but
> only array length.
> *Expected compaction*:
> bool: 10 -> 5 bytes (50%);
> byte: 10 -> 5-6 bytes (45%);
> short, char: 11 -> 5-7 bytes, 7 on average (35%);
> int: 13 -> 5-9 bytes, 7 on average (45%).
> long: 17 -> 5-13 bytes, 11 on average (35%).
> float: 13 -> 9 bytes (30%);
> double: 17 -> 13 bytes (25%);
> *Expected CPU overhead on writes:*
> Bool, float, double: -
> Byte, short, char: zero check, sign check;
> Int, long: two (shift + OR)s to understand bytes count, if small - "zero" and
> "one" checks, if big - sign check,
> *Expected CPU overhead on reads:*
> One additional branch between optimzied and non-optimized spaces.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)