Hi Jeff, I am actually still on Master-Slave. I would expect that using HRD would have cost me even more. Like you pointed out, I am indeed working on hybrid solutions now, not letting GAE in charge of everything.
On Dec 29, 3:41 am, Jeff Schnitzer <[email protected]> wrote: > It looks like you've discovered the hard way something that is not > wholly obvious at first: GAE is not good for Big Data. > > The HRD is super-cool and perfect for building reliable web > applications. But it is way too slow and expensive for large-scale > data processing. And the uber-reliability is usually pointless - when > dealing with massive data volumes, your collection system is likely > somewhat lossy in the first place. Losing a few bits probably won't > hurt you, and "synchronously replicated to more than three data > centers" is massive overkill. > > You probably have the right idea moving to another platform. Use the > right tool for the right job; maybe something like MongoDB or Hadoop. > You'll get much better map/reduce support, higher performance, and > lower cost. GAE is not a box that you're stuck in; you might still > run part of your application on GAE if it makes sense. Just keep an > eye on latency and communication costs. > > This isn't a scathing indictment of GAE so much as a realization that > it's not a universal tool. There are a lot of things that are easier > to build with other tools... and a lot of things that are easier to > build on app engine. And some things that are best hybrids of GAE and > something else. > > Jeff > > > > > > > > > > On Wed, Dec 28, 2011 at 5:26 AM, Yohan <[email protected]> wrote: > > Hi Brandon, > > > Although i agree with you that the original dataset wasnt fully > > optimized (that was over 2 years ago), i believe that i have a good > > understanding of datatore vs SQL, caching etc. Im not building public > > facing website im dealing with private apis and I am already > > stretching memcache and custom built java cache to the limits. > > > I am also not talking about the reasons why im migrating out of GAE. > > The points i highlighted were: > > > - no easy way to get your data out > > - no cheap way to get your big data out > > - bulk export in python doesn't handle binary/blob data > > - remote api is unstable > > - running database queries using cursors for long period of time is > > unreliable (many times the cursor got reset for some reason or the > > query would return a 0000000 cursor thus screwing 1 week of data > > processing) > > - it cost me an arm to delete my data > > > To answer other questions : > > - of course i thought about migrating the remaining data to a new app > > then alias from the old app to the new one. But it means interrupting > > the service (disable datastore writes) and i cant afford that. Plus > > the remaining data is still quite big. > > - the multi indexes: everytime i changed the data structure i would > > reprocess everything to conform it to the new schema. Im not using any > > framework like objectify or jdo, im working with the raw api directly > > (which is way more elegant) > > - im not criticizing the platform i am criticizing the lack of tools > > to export and the prohibitive cost of manipulating large data sets. I > > actually love GAE, it is just not for this kind of dataset thats all. > > > @Brandon : If you have a way to delete 2 billions entities (whatever > > their size) on the cheap please let me know. > > > On Dec 28, 8:48 pm, Leandro Rezende <[email protected]> wrote: > >> u pay to write, pay to keep it stored... delete should be free. > > >> 2011/12/28 Brandon Wirtz <[email protected]> > > >> > Yes, **** > > >> > While the primary app I talk about is edge Cache, that’s because that’s > >> > the thing that people can most benefit from that people don’t seem to be > >> > using.**** > > >> > ** ** > > >> > As part of my SEO tools we have what is now a 60 TB database of Backlinks > >> > and Crawler data about websites in the top 200k Alexa Sites. **** > > >> > ** ** > > >> > Why should Deleting be Cheaper? The Operation takes the same amount of > >> > CPU, and after you do the delete you don’t have to pay for storage.**** > > >> > ** ** > > >> > I don’t do near as much in the Java Space but it doesn’t seem there > >> > should > >> > be much difference between Python and Java. I ported both the primary > >> > apps > >> > to both languages to do comparative cost analysis, and there have been a > >> > few things that we found were faster or cheaper with one or the other, > >> > as a > >> > result in some case we deploy both and use different versioning so they > >> > can > >> > both be live and attached to the same data.**** > > >> > ** ** > > >> > ** ** > > >> > *From:* [email protected] [mailto: > >> > [email protected]] *On Behalf Of *André Pankraz > >> > *Sent:* Wednesday, December 28, 2011 12:06 AM > >> > *To:* [email protected] > >> > *Cc:* [email protected] > >> > *Subject:* [google-appengine] Re: Cautionary Tale: Abusive price for data > >> > migration and deletion**** > > >> > ** ** > > >> > Sry Brandon...he has a point - deleting data should be cheaper, even if > >> > it's technically the same like writing. > >> > Maybe he made some mistakes but you sometimes sound like a fanboy with > >> > GAE > >> > stockholm syndrome. ;) See what I did here...annoying accusations. > >> > You have very good experience with Python, Cache stuff, Edge cache etc., > >> > but do you really have experience with multiple 100 GB datastore to talk > >> > like this? > >> > E.g.: I have also seen some answers from you (often very helpful) that > >> > are > >> > just plain wrong in the Java environment.**** > > >> > -- > >> > You received this message because you are subscribed to the Google Groups > >> > "Google App Engine" group. > >> > To view this discussion on the web visit > >> >https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ. > >> > To post to this group, send email to [email protected]. > >> > To unsubscribe from this group, send email to > >> > [email protected]. > >> > For more options, visit this group at > >> >http://groups.google.com/group/google-appengine?hl=en.**** > > >> > -- > >> > You received this message because you are subscribed to the Google Groups > >> > "Google App Engine" group. > >> > To post to this group, send email to [email protected]. > >> > To unsubscribe from this group, send email to > >> > [email protected]. > >> > For more options, visit this group at > >> >http://groups.google.com/group/google-appengine?hl=en. > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google App Engine" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group > > athttp://groups.google.com/group/google-appengine?hl=en. > > -- > We are the 20% -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
