I haven't written code to do it, but I had been thinking about writing stuff
that serialized entities in to Blobs, Zip Compressing and put them in blob
store, then sucking down the blobs later.  

This was also what I was thinking about for a Back-up strategy.


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Jeff Schnitzer
Sent: Thursday, December 29, 2011 12:40 PM
To: [email protected]
Subject: Re: [google-appengine] Re: Cautionary Tale: Abusive price for data
migration and deletion

Just a thought (and it would probably be expensive) but perhaps you should
do a two-phase export strategy:

1) Export data into Blobstore as very large blobs
2) Suck data out of the Blobstore

The export can run at Map/Reduce speeds... as fast as you want to pay for.
Bulk downloads from the blobstore should be fast.  Unless each of your
entities are huge, fetching 30 at a time is an awfully small number.

Jeff

On Thu, Dec 29, 2011 at 1:45 AM, Yohan <[email protected]> wrote:
> Hi Jon,
>
> *cheap* is relative, i wouldn't mind receiving a harddrive from Google 
> with all my datastore in it for $X00 which would still have been 
> cheaper in time, energy and money put into migrating out.
>
> I built the tools myself, simple java programs reading from the 
> datastore by batches of 30 entities and dumping them to disk, saving 
> the cursor and continuing from there. A few lines of code really using 
> the java remote api. The issue lies in error management because the 
> datastore will break at least a few times a day due to high latency 
> and stuff (same issues you see directly within GAE but you experience 
> it remotely). So you continuously have to restart the job (manually or 
> not). That's where cursors are crucial since there is no way to 
> iterate through the database in order. And if the cursor gets 
> corrupted which happened to me 3 times in 5 weeks, you have to erase 
> everything you've done and start from scratch. Very frustrating...
>
> On Dec 29, 1:25 pm, jon <[email protected]> wrote:
>> Yohan I agree that there should be an easy and cheap way to get your 
>> data out. I think it's a little unfair that leaving GAE is made that 
>> hard.
>>
>> How much did you spend on your custom data download tool? Would you 
>> consider open sourcing it for other developers who are caught in the 
>> same position? I'd hate spending weeks building a custom tool just to 
>> get my data out.
>>
>> Thanks for sharing your experience.
>>
>> On Dec 29, 12:26 am, Yohan <[email protected]> wrote:
>>
>>
>>
>>
>>
>>
>>
>> > Hi Brandon,
>>
>> > Although i agree with you that the original dataset wasnt fully 
>> > optimized (that was over 2 years ago), i believe that i have a good 
>> > understanding of datatore vs SQL, caching etc. Im not building 
>> > public facing website im dealing with private apis and I am already 
>> > stretching memcache and custom built java cache to the limits.
>>
>> > I am also not talking about the reasons why im migrating out of GAE.
>> > The points i highlighted were:
>>
>> > - no easy way to get your data out
>> > - no cheap way to get your big data out
>> > - bulk export in python doesn't handle binary/blob data
>> > - remote api is unstable
>> > - running database queries using cursors for long period of time is 
>> > unreliable (many times the cursor got reset for some reason or the 
>> > query would return a 0000000 cursor thus screwing 1 week of data
>> > processing)
>> > - it cost me an arm to delete my data
>>
>> > To answer other questions :
>> > - of course i thought about migrating the remaining data to a new 
>> > app then alias from the old app to the new one. But it means 
>> > interrupting the service (disable datastore writes) and i cant 
>> > afford that. Plus the remaining data is still quite big.
>> > - the multi indexes: everytime i changed the data structure i would 
>> > reprocess everything to conform it to the new schema. Im not using 
>> > any framework like objectify or jdo, im working with the raw api 
>> > directly (which is way more elegant)
>> > - im not criticizing the platform i am criticizing the lack of 
>> > tools to export and the prohibitive cost of manipulating large data 
>> > sets. I actually love GAE, it is just not for this kind of dataset
thats all.
>>
>> > @Brandon : If you have a way to delete 2 billions entities 
>> > (whatever their size) on the cheap please let me know.
>>
>> > On Dec 28, 8:48 pm, Leandro Rezende <[email protected]> wrote:
>>
>> > > u pay to write, pay to keep it stored... delete should be free.
>>
>> > > 2011/12/28 Brandon Wirtz <[email protected]>
>>
>> > > > Yes, ****
>>
>> > > > While the primary app I talk about is edge Cache, that’s 
>> > > > because that’s the thing that people can most benefit from that 
>> > > > people don’t seem to be
>> > > > using.****
>>
>> > > > ** **
>>
>> > > > As part of my SEO tools we have what is now a 60 TB database of 
>> > > > Backlinks and Crawler data about websites in the top 200k Alexa 
>> > > > Sites.  ****
>>
>> > > > ** **
>>
>> > > > Why should Deleting be Cheaper? The Operation takes the same 
>> > > > amount of CPU, and after you do the delete you don’t have to 
>> > > > pay for storage.****
>>
>> > > > ** **
>>
>> > > > I don’t do near as much in the Java Space but it doesn’t seem 
>> > > > there should be much difference between Python and Java.  I 
>> > > > ported both the primary apps to both languages to do 
>> > > > comparative cost analysis, and there have been a few things 
>> > > > that we found were faster or cheaper with one or the other, as 
>> > > > a result in some case we deploy both and use different 
>> > > > versioning so they can both be live and attached to the same 
>> > > > data.****
>>
>> > > > ** **
>>
>> > > > ** **
>>
>> > > > *From:* [email protected] [mailto:
>> > > > [email protected]] *On Behalf Of *André Pankraz
>> > > > *Sent:* Wednesday, December 28, 2011 12:06 AM
>> > > > *To:* [email protected]
>> > > > *Cc:* [email protected]
>> > > > *Subject:* [google-appengine] Re: Cautionary Tale: Abusive 
>> > > > price for data migration and deletion****
>>
>> > > > ** **
>>
>> > > > Sry Brandon...he has a point - deleting data should be cheaper, 
>> > > > even if it's technically the same like writing.
>> > > > Maybe he made some mistakes but you sometimes sound like a 
>> > > > fanboy with GAE stockholm syndrome. ;) See what I did
here...annoying accusations.
>> > > > You have very good experience with Python, Cache stuff, Edge 
>> > > > cache etc., but do you really have experience  with multiple 
>> > > > 100 GB datastore to talk like this?
>> > > > E.g.: I have also seen some answers from you (often very 
>> > > > helpful) that are just plain wrong in the Java environment.****
>>
>> > > > --
>> > > > You received this message because you are subscribed to the 
>> > > >Google Groups  "Google App Engine" group.
>> > > > To view this discussion on the web visit 
>> > > >https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ.
>> > > > To post to this group, send email to
[email protected].
>> > > > To unsubscribe from this group, send email to
>> > > > [email protected].
>> > > > For more options, visit this group at
>> > > >http://groups.google.com/group/google-appengine?hl=en.****
>>
>> > > > --
>> > > > You received this message because you are subscribed to the 
>> > > > Google Groups "Google App Engine" group.
>> > > > To post to this group, send email to
[email protected].
>> > > > To unsubscribe from this group, send email to
>> > > > [email protected].
>> > > > For more options, visit this group at 
>> > > >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
[email protected].
> For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.
>



--
We are the 20%

--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to