[ 
https://issues.apache.org/jira/browse/HELIX-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352348#comment-14352348
 ] 

kishore gopalakrishna commented on HELIX-573:
---------------------------------------------

Yeah, we still need to support it but we can go a long way without bucketing if 
we compress it. We know we can support 1k partitions with raw json and no 
bucketing. By adding compression, we can probably go upto 10k partitions (need 
to validate this) per resource without bucketing.

I plan to use GZIP to compress/uncompress. Let me know if there is something 
better.

This is what I am planning to do. We have common ZNRecordSerializer to 
serialize/deserialize the data. We can simply check for a "enableCompression" 
in the simpleFields and if its true, we apply compression. On deserializing we 
can check for the magic header of GZIP and if it matches, we automatically 
decompress the data.

The advantage of this is we don't to change the api of ZNRecordSerializer or 
how it is set in various places. When a resource is created if compression is 
turned on we set enableCompression=true in the idealstate. This will take care 
of compressing idealstate. We now have to copy this in creation of current 
state and External View. We should carry it with External View since the 
controller creates it. For the CurrentState its not straightforward, since it 
is created by the participants and they don't read the IdealState. We can punt 
on the current state hoping that size of current state is inversely 
proportional to the number of nodes in the system. And if there are large 
number of partitions, the number of nodes might also be large (this is not 
necessarily true). The other option is to set the enableCompression=true the 
first time the CurrentState ZNode is created by the participant.


> Add support to compress/uncompress data on ZK
> ---------------------------------------------
>
>                 Key: HELIX-573
>                 URL: https://issues.apache.org/jira/browse/HELIX-573
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: kishore gopalakrishna
>            Assignee: kishore gopalakrishna
>
> Currently we have bucketing as one of the options when the number of 
> partitions are large. We have couple of bugs with the handling of bucketized 
> resources (one of them is fatal). 
> One of the reasons to split the znode is because we use JSON to store the 
> data in ZNode. While JSON is good for debugging, its space inefficient.
> A better option before going to bucketing is to support compression of Ideal 
> state, current state and External View. This also gives good performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to