is this a valid implementation?

class JsonMarshalZipProperty(ndb.BlobProperty):

    def _to_base_type(self, value):
        return zlib.compress(marshal.dumps(value, MARSHAL_VERSION))

    def _from_base_type(self, value):
        return marshal.loads(zlib.decompress(value))


On Jun 4, 2012, at 9:49 AM, Andreas wrote:

> great. how would this look for the ndb package?
> 
> On Jun 1, 2012, at 2:40 PM, Andrin von Rechenberg wrote:
> 
>> Hey there
>> 
>> If you want to store megabytes of JSON in datastore
>> and get it back from datastore into python already parsed, 
>> this post is for you.
>> 
>> I ran a couple of performance tests where I want to store
>> a 4 MB json object in the datastore and then get it back at
>> a later point and process it.
>> 
>> There are several ways to do this.
>> 
>> Challenge 1) Serialization
>> You need to serialize your data.
>> For this you can use several different libraries.
>> JSON objects can be serialized using:
>> the json lib, the cPickle lib or the marshal lib.
>> (these are the libraries I'm aware of atm)
>> 
>> Challenge 2) Compression
>> If your serialized data doesn't fit into 1mb you need
>> to shard your data over multiple datastore entities and
>> manually build it together when loading the entities back.
>> If you compress your serialized data and store it then,
>> you have the cost of compression and decompression,
>> but you have to fetch fewer datastore entities when you
>> want to load your data and you have to write fewer
>> datastore entities if you want to update your data if it
>> sharded.
>> 
>> Solution for 1) Serialization:
>> cPickle is very slow. It's meant to serialize real
>> objects and not just json. JSON is much faster,
>> but compared to marshal it has no chance.
>> The python marshal library is definitely the
>> way to serialize JSON. It has the best performance
>> 
>> Solution for 2) Compression:
>> For my use-case it makes absolutely sense to
>> compress the data the marshal lib produces
>> before storing it in datastore. I have gigabytes
>> of JSON data. Compressing the data makes
>> it about 5x smaller. Doing 5x fewer datastore
>> operations definitely pays for the the time it
>> takes to compress and decompress the data.
>> There are several compression levels you
>> can use to when using python's zlib.
>> From 1 (lowest compression, but fastest)
>> to 9 (highest compression but slowest).
>> During my tests I figured that the optimum
>> is to compress your serialized data using
>> zlib with level 1 compression. Higher
>> compression takes to much CPU and
>> the result is only marginally smaller.
>> 
>> Here are my test results:
>> cPickle ziplvl: 0
>> 
>> dump: 1.671010s
>> load: 0.764567s
>> size: 3297275
>> cPickle ziplvl: 1
>> 
>> dump: 2.033570s
>> load: 0.874783s
>> size: 935327
>> json ziplvl: 0
>> 
>> dump: 0.595903s
>> load: 0.698307s
>> size: 2321719
>> json ziplvl: 1
>> 
>> dump: 0.667103s
>> load: 0.795470s
>> size: 458030
>> marshal ziplvl: 0
>> 
>> dump: 0.118067s
>> load: 0.314645s
>> size: 2311342
>> marshal ziplvl: 1
>> 
>> dump: 0.315362s
>> load: 0.335677s
>> size: 470956
>> marshal ziplvl: 2
>> 
>> dump: 0.318787s
>> load: 0.380117s
>> size: 457196
>> marshal ziplvl: 3
>> 
>> dump: 0.350247s
>> load: 0.364908s
>> size: 446085
>> marshal ziplvl: 4
>> 
>> dump: 0.414658s
>> load: 0.318973s
>> size: 437764
>> marshal ziplvl: 5
>> 
>> dump: 0.448890s
>> load: 0.350013s
>> size: 418712
>> marshal ziplvl: 6
>> 
>> dump: 0.516882s
>> load: 0.367595s
>> size: 409947
>> marshal ziplvl: 7
>> 
>> dump: 0.617210s
>> load: 0.315827s
>> size: 398354
>> marshal ziplvl: 8
>> 
>> dump: 1.117032s
>> load: 0.346452s
>> size: 392332
>> marshal ziplvl: 9
>> 
>> dump: 1.366547s
>> load: 0.368925s
>> size: 391921
>> The results do not include datastore operations,
>> it's just about creating a blob that can be stored
>> in the datastore and getting the parsed data back.
>> The times of "dump" and "load" are seconds it takes
>> to do this on a Google AppEngine F1 instances
>> (600Mhz, 128mb RAM).
>> 
>> I posted this email on my blog: 
>> http://devblog.miumeet.com/2012/06/storing-json-efficiently-in-python-on.html
>> You can also comment there or on this email thread.
>> 
>> Enjoy,
>> -Andrin
>> 
>> Here is the library i created an use:
>> 
>> #!/usr/bin/env python
>> #
>> # Copyright 2012 MiuMeet AG
>> #
>> # Licensed under the Apache License, Version 2.0 (the "License");
>> # you may not use this file except in compliance with the License.
>> # You may obtain a copy of the License at
>> #
>> #     http://www.apache.org/licenses/LICENSE-2.0
>> #
>> # Unless required by applicable law or agreed to in writing, software
>> # distributed under the License is distributed on an "AS IS" BASIS,
>> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> # See the License for the specific language governing permissions and
>> # limitations under the License.
>> #
>> 
>> from google.appengine.api import datastore_types
>> from google.appengine.ext import db
>> 
>> import zlib
>> import marshal
>> 
>> MARSHAL_VERSION = 2
>> COMPRESSION_LEVEL = 1
>> 
>> class JsonMarshalZipProperty(db.BlobProperty):
>>   """Stores a JSON serializable object using zlib and marshal in a db.Blob"""
>> 
>>   def default_value(self):
>>     return None
>>   
>>   def get_value_for_datastore(self, model_instance):
>>     value = self.__get__(model_instance, model_instance.__class__)
>>     if value is None:
>>       return None
>>     return db.Blob(zlib.compress(marshal.dumps(value, MARSHAL_VERSION),
>>                                  COMPRESSION_LEVEL))
>> 
>>   def make_value_from_datastore(self, value):
>>     if value is not None:
>>       return marshal.loads(zlib.decompress(value))
>>     return value
>> 
>>   data_type = datastore_types.Blob
>>   
>>   def validate(self, value):
>>     return value
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/google-appengine?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to