Hi Jon, *cheap* is relative, i wouldn't mind receiving a harddrive from Google with all my datastore in it for $X00 which would still have been cheaper in time, energy and money put into migrating out.
I built the tools myself, simple java programs reading from the datastore by batches of 30 entities and dumping them to disk, saving the cursor and continuing from there. A few lines of code really using the java remote api. The issue lies in error management because the datastore will break at least a few times a day due to high latency and stuff (same issues you see directly within GAE but you experience it remotely). So you continuously have to restart the job (manually or not). That's where cursors are crucial since there is no way to iterate through the database in order. And if the cursor gets corrupted which happened to me 3 times in 5 weeks, you have to erase everything you've done and start from scratch. Very frustrating... On Dec 29, 1:25 pm, jon <[email protected]> wrote: > Yohan I agree that there should be an easy and cheap way to get your > data out. I think it's a little unfair that leaving GAE is made that > hard. > > How much did you spend on your custom data download tool? Would you > consider open sourcing it for other developers who are caught in the > same position? I'd hate spending weeks building a custom tool just to > get my data out. > > Thanks for sharing your experience. > > On Dec 29, 12:26 am, Yohan <[email protected]> wrote: > > > > > > > > > Hi Brandon, > > > Although i agree with you that the original dataset wasnt fully > > optimized (that was over 2 years ago), i believe that i have a good > > understanding of datatore vs SQL, caching etc. Im not building public > > facing website im dealing with private apis and I am already > > stretching memcache and custom built java cache to the limits. > > > I am also not talking about the reasons why im migrating out of GAE. > > The points i highlighted were: > > > - no easy way to get your data out > > - no cheap way to get your big data out > > - bulk export in python doesn't handle binary/blob data > > - remote api is unstable > > - running database queries using cursors for long period of time is > > unreliable (many times the cursor got reset for some reason or the > > query would return a 0000000 cursor thus screwing 1 week of data > > processing) > > - it cost me an arm to delete my data > > > To answer other questions : > > - of course i thought about migrating the remaining data to a new app > > then alias from the old app to the new one. But it means interrupting > > the service (disable datastore writes) and i cant afford that. Plus > > the remaining data is still quite big. > > - the multi indexes: everytime i changed the data structure i would > > reprocess everything to conform it to the new schema. Im not using any > > framework like objectify or jdo, im working with the raw api directly > > (which is way more elegant) > > - im not criticizing the platform i am criticizing the lack of tools > > to export and the prohibitive cost of manipulating large data sets. I > > actually love GAE, it is just not for this kind of dataset thats all. > > > @Brandon : If you have a way to delete 2 billions entities (whatever > > their size) on the cheap please let me know. > > > On Dec 28, 8:48 pm, Leandro Rezende <[email protected]> wrote: > > > > u pay to write, pay to keep it stored... delete should be free. > > > > 2011/12/28 Brandon Wirtz <[email protected]> > > > > > Yes, **** > > > > > While the primary app I talk about is edge Cache, that’s because that’s > > > > the thing that people can most benefit from that people don’t seem to be > > > > using.**** > > > > > ** ** > > > > > As part of my SEO tools we have what is now a 60 TB database of > > > > Backlinks > > > > and Crawler data about websites in the top 200k Alexa Sites. **** > > > > > ** ** > > > > > Why should Deleting be Cheaper? The Operation takes the same amount of > > > > CPU, and after you do the delete you don’t have to pay for storage.**** > > > > > ** ** > > > > > I don’t do near as much in the Java Space but it doesn’t seem there > > > > should > > > > be much difference between Python and Java. I ported both the primary > > > > apps > > > > to both languages to do comparative cost analysis, and there have been a > > > > few things that we found were faster or cheaper with one or the other, > > > > as a > > > > result in some case we deploy both and use different versioning so they > > > > can > > > > both be live and attached to the same data.**** > > > > > ** ** > > > > > ** ** > > > > > *From:* [email protected] [mailto: > > > > [email protected]] *On Behalf Of *André Pankraz > > > > *Sent:* Wednesday, December 28, 2011 12:06 AM > > > > *To:* [email protected] > > > > *Cc:* [email protected] > > > > *Subject:* [google-appengine] Re: Cautionary Tale: Abusive price for > > > > data > > > > migration and deletion**** > > > > > ** ** > > > > > Sry Brandon...he has a point - deleting data should be cheaper, even if > > > > it's technically the same like writing. > > > > Maybe he made some mistakes but you sometimes sound like a fanboy with > > > > GAE > > > > stockholm syndrome. ;) See what I did here...annoying accusations. > > > > You have very good experience with Python, Cache stuff, Edge cache etc., > > > > but do you really have experience with multiple 100 GB datastore to > > > > talk > > > > like this? > > > > E.g.: I have also seen some answers from you (often very helpful) that > > > > are > > > > just plain wrong in the Java environment.**** > > > > > -- > > > > You received this message because you are subscribed to the Google > > > > Groups > > > > "Google App Engine" group. > > > > To view this discussion on the web visit > > > >https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ. > > > > To post to this group, send email to [email protected]. > > > > To unsubscribe from this group, send email to > > > > [email protected]. > > > > For more options, visit this group at > > > >http://groups.google.com/group/google-appengine?hl=en.**** > > > > > -- > > > > You received this message because you are subscribed to the Google > > > > Groups > > > > "Google App Engine" group. > > > > To post to this group, send email to [email protected]. > > > > To unsubscribe from this group, send email to > > > > [email protected]. > > > > For more options, visit this group at > > > >http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
