[
https://issues.apache.org/jira/browse/HELIX-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352348#comment-14352348
]
kishore gopalakrishna commented on HELIX-573:
---------------------------------------------
Yeah, we still need to support it but we can go a long way without bucketing if
we compress it. We know we can support 1k partitions with raw json and no
bucketing. By adding compression, we can probably go upto 10k partitions (need
to validate this) per resource without bucketing.
I plan to use GZIP to compress/uncompress. Let me know if there is something
better.
This is what I am planning to do. We have common ZNRecordSerializer to
serialize/deserialize the data. We can simply check for a "enableCompression"
in the simpleFields and if its true, we apply compression. On deserializing we
can check for the magic header of GZIP and if it matches, we automatically
decompress the data.
The advantage of this is we don't to change the api of ZNRecordSerializer or
how it is set in various places. When a resource is created if compression is
turned on we set enableCompression=true in the idealstate. This will take care
of compressing idealstate. We now have to copy this in creation of current
state and External View. We should carry it with External View since the
controller creates it. For the CurrentState its not straightforward, since it
is created by the participants and they don't read the IdealState. We can punt
on the current state hoping that size of current state is inversely
proportional to the number of nodes in the system. And if there are large
number of partitions, the number of nodes might also be large (this is not
necessarily true). The other option is to set the enableCompression=true the
first time the CurrentState ZNode is created by the participant.
> Add support to compress/uncompress data on ZK
> ---------------------------------------------
>
> Key: HELIX-573
> URL: https://issues.apache.org/jira/browse/HELIX-573
> Project: Apache Helix
> Issue Type: Improvement
> Reporter: kishore gopalakrishna
> Assignee: kishore gopalakrishna
>
> Currently we have bucketing as one of the options when the number of
> partitions are large. We have couple of bugs with the handling of bucketized
> resources (one of them is fatal).
> One of the reasons to split the znode is because we use JSON to store the
> data in ZNode. While JSON is good for debugging, its space inefficient.
> A better option before going to bucketing is to support compression of Ideal
> state, current state and External View. This also gives good performance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)