I haven't written code to do it, but I had been thinking about writing stuff that serialized entities in to Blobs, Zip Compressing and put them in blob store, then sucking down the blobs later.
This was also what I was thinking about for a Back-up strategy. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jeff Schnitzer Sent: Thursday, December 29, 2011 12:40 PM To: [email protected] Subject: Re: [google-appengine] Re: Cautionary Tale: Abusive price for data migration and deletion Just a thought (and it would probably be expensive) but perhaps you should do a two-phase export strategy: 1) Export data into Blobstore as very large blobs 2) Suck data out of the Blobstore The export can run at Map/Reduce speeds... as fast as you want to pay for. Bulk downloads from the blobstore should be fast. Unless each of your entities are huge, fetching 30 at a time is an awfully small number. Jeff On Thu, Dec 29, 2011 at 1:45 AM, Yohan <[email protected]> wrote: > Hi Jon, > > *cheap* is relative, i wouldn't mind receiving a harddrive from Google > with all my datastore in it for $X00 which would still have been > cheaper in time, energy and money put into migrating out. > > I built the tools myself, simple java programs reading from the > datastore by batches of 30 entities and dumping them to disk, saving > the cursor and continuing from there. A few lines of code really using > the java remote api. The issue lies in error management because the > datastore will break at least a few times a day due to high latency > and stuff (same issues you see directly within GAE but you experience > it remotely). So you continuously have to restart the job (manually or > not). That's where cursors are crucial since there is no way to > iterate through the database in order. And if the cursor gets > corrupted which happened to me 3 times in 5 weeks, you have to erase > everything you've done and start from scratch. Very frustrating... > > On Dec 29, 1:25 pm, jon <[email protected]> wrote: >> Yohan I agree that there should be an easy and cheap way to get your >> data out. I think it's a little unfair that leaving GAE is made that >> hard. >> >> How much did you spend on your custom data download tool? Would you >> consider open sourcing it for other developers who are caught in the >> same position? I'd hate spending weeks building a custom tool just to >> get my data out. >> >> Thanks for sharing your experience. >> >> On Dec 29, 12:26 am, Yohan <[email protected]> wrote: >> >> >> >> >> >> >> >> > Hi Brandon, >> >> > Although i agree with you that the original dataset wasnt fully >> > optimized (that was over 2 years ago), i believe that i have a good >> > understanding of datatore vs SQL, caching etc. Im not building >> > public facing website im dealing with private apis and I am already >> > stretching memcache and custom built java cache to the limits. >> >> > I am also not talking about the reasons why im migrating out of GAE. >> > The points i highlighted were: >> >> > - no easy way to get your data out >> > - no cheap way to get your big data out >> > - bulk export in python doesn't handle binary/blob data >> > - remote api is unstable >> > - running database queries using cursors for long period of time is >> > unreliable (many times the cursor got reset for some reason or the >> > query would return a 0000000 cursor thus screwing 1 week of data >> > processing) >> > - it cost me an arm to delete my data >> >> > To answer other questions : >> > - of course i thought about migrating the remaining data to a new >> > app then alias from the old app to the new one. But it means >> > interrupting the service (disable datastore writes) and i cant >> > afford that. Plus the remaining data is still quite big. >> > - the multi indexes: everytime i changed the data structure i would >> > reprocess everything to conform it to the new schema. Im not using >> > any framework like objectify or jdo, im working with the raw api >> > directly (which is way more elegant) >> > - im not criticizing the platform i am criticizing the lack of >> > tools to export and the prohibitive cost of manipulating large data >> > sets. I actually love GAE, it is just not for this kind of dataset thats all. >> >> > @Brandon : If you have a way to delete 2 billions entities >> > (whatever their size) on the cheap please let me know. >> >> > On Dec 28, 8:48 pm, Leandro Rezende <[email protected]> wrote: >> >> > > u pay to write, pay to keep it stored... delete should be free. >> >> > > 2011/12/28 Brandon Wirtz <[email protected]> >> >> > > > Yes, **** >> >> > > > While the primary app I talk about is edge Cache, that’s >> > > > because that’s the thing that people can most benefit from that >> > > > people don’t seem to be >> > > > using.**** >> >> > > > ** ** >> >> > > > As part of my SEO tools we have what is now a 60 TB database of >> > > > Backlinks and Crawler data about websites in the top 200k Alexa >> > > > Sites. **** >> >> > > > ** ** >> >> > > > Why should Deleting be Cheaper? The Operation takes the same >> > > > amount of CPU, and after you do the delete you don’t have to >> > > > pay for storage.**** >> >> > > > ** ** >> >> > > > I don’t do near as much in the Java Space but it doesn’t seem >> > > > there should be much difference between Python and Java. I >> > > > ported both the primary apps to both languages to do >> > > > comparative cost analysis, and there have been a few things >> > > > that we found were faster or cheaper with one or the other, as >> > > > a result in some case we deploy both and use different >> > > > versioning so they can both be live and attached to the same >> > > > data.**** >> >> > > > ** ** >> >> > > > ** ** >> >> > > > *From:* [email protected] [mailto: >> > > > [email protected]] *On Behalf Of *André Pankraz >> > > > *Sent:* Wednesday, December 28, 2011 12:06 AM >> > > > *To:* [email protected] >> > > > *Cc:* [email protected] >> > > > *Subject:* [google-appengine] Re: Cautionary Tale: Abusive >> > > > price for data migration and deletion**** >> >> > > > ** ** >> >> > > > Sry Brandon...he has a point - deleting data should be cheaper, >> > > > even if it's technically the same like writing. >> > > > Maybe he made some mistakes but you sometimes sound like a >> > > > fanboy with GAE stockholm syndrome. ;) See what I did here...annoying accusations. >> > > > You have very good experience with Python, Cache stuff, Edge >> > > > cache etc., but do you really have experience with multiple >> > > > 100 GB datastore to talk like this? >> > > > E.g.: I have also seen some answers from you (often very >> > > > helpful) that are just plain wrong in the Java environment.**** >> >> > > > -- >> > > > You received this message because you are subscribed to the >> > > >Google Groups "Google App Engine" group. >> > > > To view this discussion on the web visit >> > > >https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ. >> > > > To post to this group, send email to [email protected]. >> > > > To unsubscribe from this group, send email to >> > > > [email protected]. >> > > > For more options, visit this group at >> > > >http://groups.google.com/group/google-appengine?hl=en.**** >> >> > > > -- >> > > > You received this message because you are subscribed to the >> > > > Google Groups "Google App Engine" group. >> > > > To post to this group, send email to [email protected]. >> > > > To unsubscribe from this group, send email to >> > > > [email protected]. >> > > > For more options, visit this group at >> > > >http://groups.google.com/group/google-appengine?hl=en. > > -- > You received this message because you are subscribed to the Google Groups "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to [email protected]. > For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. > -- We are the 20% -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
