Hi Raymond, Don't misunderstand me. GAE is a great tool, i seriously love it and advocate it everywhere I go. But since my last experience I would recommend an hybrid solution instead of full steam GAE. At least for data gathering and processing.
The cost structure is quite clear : $1 / 1 million writes Each entity write / delete = at least 2writes (entity + key) + N writes for the indexes (+ maybe replication i dont know if that's counted) So if you have 100 millions entities that's an easy $200-300 to delete it. And believe me it is really easy to generate that many entities when your app processes 1500 req/s, even by aggregating you are still limited to 1MB / entity member (but i don't like to play near the bytes limits due to potential serialization overhead so i won't store more than 75-80% of 1MB / entity member). I believe that the datastore (or even GAE memcache) doesn't offer a simple flush mechanism because the entire platform is shared among multiple (all?) the apps and the way data is stored doesn't allow for a simple flush. (counting is a different matter). I just hope that the GAE team will read my article and maybe lower the price of deletion a bit. On Dec 29, 4:13 pm, Raymond <[email protected]> wrote: > Dear Yohan, > > On my side I thank you for sharing your experience, I am beginning > with GAE and know that whatever the time I will put on this project I > will be making beginner mistakes and this kind of info is precious. > I have now a limited experience with GAE and have to compare it with > what I know and in some sectors GAE look very bad, for example I can't > imagine Oracle, DB2, Informix, etc, ...MsSQL, etc having any > commercial success if they would not have implemented rock solid > solutions to import and export data, backup, build and drop tables and > databases and of course calculate precisely the data space required to > build a data structure, in some cases down to the byte. > Although I understand the very different nature of GAE compared to > this traditional DB engines, I think that any professional developer, > IT manager, project manager, or person responsible for budget would > feel very uncomfortable building a system without a firm grip on it's > costs or a reasonable solution to modify an initial implementation or > migrate away from it. Also the fact that part of the GAE tools are > simply not reliable enough to be able to plan effort and time required > to do something is an other big minus for this solution. > > Although DB's are not my main competence, my very first paid job > 20+years ago was to migrate a critical database to a new structure on > a new machine (HP 9000 unix), using a long forgotten database engine, > the first attempt using SQL took 1 week to migrate, the second using > low level C calls took months to develop and migrated in the required > 3.5 hours, but the important thing to note is that It never crossed my > mind to question the reliability of the machine, the database or the C > calls I was making to the DB, it just worked, the Server could be > locked for minutes swapping to disk because of lack of memory or > overload, but it never failed once and repeated the exercise time and > time again, reliably and in a predictable timeframe. > > All this said there are advantages to GAE that are worth fighting with > it's limitations, I have not yet found anything else that is so > immediately and massively scalable and at the same time does not > require me to manage the software and hardware, this is invaluable, > and although I know that I could have a easier job moving to MySQL, I > just don't want to manage an OS and a DB engine, I don't have the > time, I have done it and don't think that's where I am going to earn > my bacon. > > I will always envy some of the people answering your message for the > depth of knowledge they have of this platform and the fact that they > always have the right solution and right answer to everything, it must > be great to never make mistakes. > > -R > > On Dec 29, 6:25 am, jon <[email protected]> wrote: > > > > > > > > > Yohan I agree that there should be an easy and cheap way to get your > > data out. I think it's a little unfair that leaving GAE is made that > > hard. > > > How much did you spend on your custom data download tool? Would you > > consider open sourcing it for other developers who are caught in the > > same position? I'd hate spending weeks building a custom tool just to > > get my data out. > > > Thanks for sharing your experience. > > > On Dec 29, 12:26 am, Yohan <[email protected]> wrote: > > > > Hi Brandon, > > > > Although i agree with you that the original dataset wasnt fully > > > optimized (that was over 2 years ago), i believe that i have a good > > > understanding of datatore vs SQL, caching etc. Im not building public > > > facing website im dealing with private apis and I am already > > > stretching memcache and custom built java cache to the limits. > > > > I am also not talking about the reasons why im migrating out of GAE. > > > The points i highlighted were: > > > > - no easy way to get your data out > > > - no cheap way to get your big data out > > > - bulk export in python doesn't handle binary/blob data > > > - remote api is unstable > > > - running database queries using cursors for long period of time is > > > unreliable (many times the cursor got reset for some reason or the > > > query would return a 0000000 cursor thus screwing 1 week of data > > > processing) > > > - it cost me an arm to delete my data > > > > To answer other questions : > > > - of course i thought about migrating the remaining data to a new app > > > then alias from the old app to the new one. But it means interrupting > > > the service (disable datastore writes) and i cant afford that. Plus > > > the remaining data is still quite big. > > > - the multi indexes: everytime i changed the data structure i would > > > reprocess everything to conform it to the new schema. Im not using any > > > framework like objectify or jdo, im working with the raw api directly > > > (which is way more elegant) > > > - im not criticizing the platform i am criticizing the lack of tools > > > to export and the prohibitive cost of manipulating large data sets. I > > > actually love GAE, it is just not for this kind of dataset thats all. > > > > @Brandon : If you have a way to delete 2 billions entities (whatever > > > their size) on the cheap please let me know. > > > > On Dec 28, 8:48 pm, Leandro Rezende <[email protected]> wrote: > > > > > u pay to write, pay to keep it stored... delete should be free. > > > > > 2011/12/28 Brandon Wirtz <[email protected]> > > > > > > Yes, **** > > > > > > While the primary app I talk about is edge Cache, that’s because > > > > > that’s > > > > > the thing that people can most benefit from that people don’t seem to > > > > > be > > > > > using.**** > > > > > > ** ** > > > > > > As part of my SEO tools we have what is now a 60 TB database of > > > > > Backlinks > > > > > and Crawler data about websites in the top 200k Alexa Sites. **** > > > > > > ** ** > > > > > > Why should Deleting be Cheaper? The Operation takes the same amount of > > > > > CPU, and after you do the delete you don’t have to pay for > > > > > storage.**** > > > > > > ** ** > > > > > > I don’t do near as much in the Java Space but it doesn’t seem there > > > > > should > > > > > be much difference between Python and Java. I ported both the > > > > > primary apps > > > > > to both languages to do comparative cost analysis, and there have > > > > > been a > > > > > few things that we found were faster or cheaper with one or the > > > > > other, as a > > > > > result in some case we deploy both and use different versioning so > > > > > they can > > > > > both be live and attached to the same data.**** > > > > > > ** ** > > > > > > ** ** > > > > > > *From:* [email protected] [mailto: > > > > > [email protected]] *On Behalf Of *André Pankraz > > > > > *Sent:* Wednesday, December 28, 2011 12:06 AM > > > > > *To:* [email protected] > > > > > *Cc:* [email protected] > > > > > *Subject:* [google-appengine] Re: Cautionary Tale: Abusive price for > > > > > data > > > > > migration and deletion**** > > > > > > ** ** > > > > > > Sry Brandon...he has a point - deleting data should be cheaper, even > > > > > if > > > > > it's technically the same like writing. > > > > > Maybe he made some mistakes but you sometimes sound like a fanboy > > > > > with GAE > > > > > stockholm syndrome. ;) See what I did here...annoying accusations. > > > > > You have very good experience with Python, Cache stuff, Edge cache > > > > > etc., > > > > > but do you really have experience with multiple 100 GB datastore to > > > > > talk > > > > > like this? > > > > > E.g.: I have also seen some answers from you (often very helpful) > > > > > that are > > > > > just plain wrong in the Java environment.**** > > > > > > -- > > > > > You received this message because you are subscribed to the Google > > > > > Groups > > > > > "Google App Engine" group. > > > > > To view this discussion on the web visit > > > > >https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ. > > > > > To post to this group, send email to > > > > > [email protected]. > > > > > To unsubscribe from this group, send email to > > > > > [email protected]. > > > > > For more options, visit this group at > > > > >http://groups.google.com/group/google-appengine?hl=en.**** > > > > > > -- > > > > > You received this message because you are subscribed to the Google > > > > > Groups > > > > > "Google App Engine" group. > > > > > To post to this group, send email to > > > > > [email protected]. > > > > > To unsubscribe from this group, send email to > > > > > [email protected]. > > > > > For more options, visit this group at > > > > >http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
