Just a thought (and it would probably be expensive) but perhaps you should do a two-phase export strategy:
1) Export data into Blobstore as very large blobs 2) Suck data out of the Blobstore The export can run at Map/Reduce speeds... as fast as you want to pay for. Bulk downloads from the blobstore should be fast. Unless each of your entities are huge, fetching 30 at a time is an awfully small number. Jeff On Thu, Dec 29, 2011 at 1:45 AM, Yohan <[email protected]> wrote: > Hi Jon, > > *cheap* is relative, i wouldn't mind receiving a harddrive from Google > with all my datastore in it for $X00 which would still have been > cheaper in time, energy and money put into migrating out. > > I built the tools myself, simple java programs reading from the > datastore by batches of 30 entities and dumping them to disk, saving > the cursor and continuing from there. A few lines of code really using > the java remote api. The issue lies in error management because the > datastore will break at least a few times a day due to high latency > and stuff (same issues you see directly within GAE but you experience > it remotely). So you continuously have to restart the job (manually or > not). That's where cursors are crucial since there is no way to > iterate through the database in order. And if the cursor gets > corrupted which happened to me 3 times in 5 weeks, you have to erase > everything you've done and start from scratch. Very frustrating... > > On Dec 29, 1:25 pm, jon <[email protected]> wrote: >> Yohan I agree that there should be an easy and cheap way to get your >> data out. I think it's a little unfair that leaving GAE is made that >> hard. >> >> How much did you spend on your custom data download tool? Would you >> consider open sourcing it for other developers who are caught in the >> same position? I'd hate spending weeks building a custom tool just to >> get my data out. >> >> Thanks for sharing your experience. >> >> On Dec 29, 12:26 am, Yohan <[email protected]> wrote: >> >> >> >> >> >> >> >> > Hi Brandon, >> >> > Although i agree with you that the original dataset wasnt fully >> > optimized (that was over 2 years ago), i believe that i have a good >> > understanding of datatore vs SQL, caching etc. Im not building public >> > facing website im dealing with private apis and I am already >> > stretching memcache and custom built java cache to the limits. >> >> > I am also not talking about the reasons why im migrating out of GAE. >> > The points i highlighted were: >> >> > - no easy way to get your data out >> > - no cheap way to get your big data out >> > - bulk export in python doesn't handle binary/blob data >> > - remote api is unstable >> > - running database queries using cursors for long period of time is >> > unreliable (many times the cursor got reset for some reason or the >> > query would return a 0000000 cursor thus screwing 1 week of data >> > processing) >> > - it cost me an arm to delete my data >> >> > To answer other questions : >> > - of course i thought about migrating the remaining data to a new app >> > then alias from the old app to the new one. But it means interrupting >> > the service (disable datastore writes) and i cant afford that. Plus >> > the remaining data is still quite big. >> > - the multi indexes: everytime i changed the data structure i would >> > reprocess everything to conform it to the new schema. Im not using any >> > framework like objectify or jdo, im working with the raw api directly >> > (which is way more elegant) >> > - im not criticizing the platform i am criticizing the lack of tools >> > to export and the prohibitive cost of manipulating large data sets. I >> > actually love GAE, it is just not for this kind of dataset thats all. >> >> > @Brandon : If you have a way to delete 2 billions entities (whatever >> > their size) on the cheap please let me know. >> >> > On Dec 28, 8:48 pm, Leandro Rezende <[email protected]> wrote: >> >> > > u pay to write, pay to keep it stored... delete should be free. >> >> > > 2011/12/28 Brandon Wirtz <[email protected]> >> >> > > > Yes, **** >> >> > > > While the primary app I talk about is edge Cache, that’s because that’s >> > > > the thing that people can most benefit from that people don’t seem to >> > > > be >> > > > using.**** >> >> > > > ** ** >> >> > > > As part of my SEO tools we have what is now a 60 TB database of >> > > > Backlinks >> > > > and Crawler data about websites in the top 200k Alexa Sites. **** >> >> > > > ** ** >> >> > > > Why should Deleting be Cheaper? The Operation takes the same amount of >> > > > CPU, and after you do the delete you don’t have to pay for storage.**** >> >> > > > ** ** >> >> > > > I don’t do near as much in the Java Space but it doesn’t seem there >> > > > should >> > > > be much difference between Python and Java. I ported both the primary >> > > > apps >> > > > to both languages to do comparative cost analysis, and there have been >> > > > a >> > > > few things that we found were faster or cheaper with one or the other, >> > > > as a >> > > > result in some case we deploy both and use different versioning so >> > > > they can >> > > > both be live and attached to the same data.**** >> >> > > > ** ** >> >> > > > ** ** >> >> > > > *From:* [email protected] [mailto: >> > > > [email protected]] *On Behalf Of *André Pankraz >> > > > *Sent:* Wednesday, December 28, 2011 12:06 AM >> > > > *To:* [email protected] >> > > > *Cc:* [email protected] >> > > > *Subject:* [google-appengine] Re: Cautionary Tale: Abusive price for >> > > > data >> > > > migration and deletion**** >> >> > > > ** ** >> >> > > > Sry Brandon...he has a point - deleting data should be cheaper, even if >> > > > it's technically the same like writing. >> > > > Maybe he made some mistakes but you sometimes sound like a fanboy with >> > > > GAE >> > > > stockholm syndrome. ;) See what I did here...annoying accusations. >> > > > You have very good experience with Python, Cache stuff, Edge cache >> > > > etc., >> > > > but do you really have experience with multiple 100 GB datastore to >> > > > talk >> > > > like this? >> > > > E.g.: I have also seen some answers from you (often very helpful) that >> > > > are >> > > > just plain wrong in the Java environment.**** >> >> > > > -- >> > > > You received this message because you are subscribed to the Google >> > > > Groups >> > > > "Google App Engine" group. >> > > > To view this discussion on the web visit >> > > >https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ. >> > > > To post to this group, send email to [email protected]. >> > > > To unsubscribe from this group, send email to >> > > > [email protected]. >> > > > For more options, visit this group at >> > > >http://groups.google.com/group/google-appengine?hl=en.**** >> >> > > > -- >> > > > You received this message because you are subscribed to the Google >> > > > Groups >> > > > "Google App Engine" group. >> > > > To post to this group, send email to [email protected]. >> > > > To unsubscribe from this group, send email to >> > > > [email protected]. >> > > > For more options, visit this group at >> > > >http://groups.google.com/group/google-appengine?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- We are the 20% -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
