Yeah, we still need to support it but we can go a long way without bucketing if we compress it. We know we can support 1k partitions with raw json and no bucketing. By adding compression, we can probably go upto 10k partitions (need to validate this) per resource without bucketing.
I plan to use GZIP to compress/uncompress. Let me know if there is something better. This is what I am planning to do. We have common ZNRecordSerializer to serialize/deserialize the data. We can simply check for a "enableCompression" in the simpleFields and if its true, we apply compression. On deserializing we can check for the magic header of GZIP and if it matches, we automatically decompress the data. The advantage of this is we don't to change the api of ZNRecordSerializer or how it is set in various places. When a resource is created if compression is turned on we set enableCompression=true in the idealstate. This will take care of compressing idealstate. We now have to copy this in creation of current state and External View. We should carry it with External View since the controller creates it. For the CurrentState its not straightforward, since it is created by the participants and they don't read the IdealState. We can punt on the current state hoping that size of current state is inversely proportional to the number of nodes in the system. And if there are large number of partitions, the number of nodes might also be large (this is not necessarily true). The other option is to set the enableCompression=true the first time the CurrentState ZNode is created by the participant. Let me know what you think. On Sun, Mar 8, 2015 at 11:09 AM, Kanak Biscuitwala <[email protected]> wrote: > I like this idea, but we would still need to support bucketizing either > way because we cannot guarantee that the compressed version will be compact > enough for every use case. > > What types of compression schemes are you planning to support? > > ---------------------------------------- > > Date: Sat, 7 Mar 2015 22:30:15 -0800 > > Subject: Use compression to store data in ZK > > From: [email protected] > > To: [email protected] > > > > Hi, > > > > Currently we have bucketing as one of the options when the number of > > partitions are large. We have couple of bugs with the handling of > > bucketized resources (one of them is fatal). > > > > One of the reasons to split the znode is because we use JSON to store the > > data in ZNode. While JSON is good for debugging, its space inefficient. > > > > A better option before going to bucketing is to support compression of > > Ideal state, current state and External View. This also gives good > > performance. > > > > I plan to add this support and make it configurable. Feedback/suggestions > > > > thanks, > > Kishore G > >
