Hi Jeff,

I am actually still on Master-Slave. I would expect that using HRD
would have cost me even more.
Like you pointed out, I am indeed working on hybrid solutions now, not
letting GAE in charge of everything.

On Dec 29, 3:41 am, Jeff Schnitzer <[email protected]> wrote:
> It looks like you've discovered the hard way something that is not
> wholly obvious at first:  GAE is not good for Big Data.
>
> The HRD is super-cool and perfect for building reliable web
> applications.  But it is way too slow and expensive for large-scale
> data processing.  And the uber-reliability is usually pointless - when
> dealing with massive data volumes, your collection system is likely
> somewhat lossy in the first place.  Losing a few bits probably won't
> hurt you, and "synchronously replicated to more than three data
> centers" is massive overkill.
>
> You probably have the right idea moving to another platform.  Use the
> right tool for the right job; maybe something like MongoDB or Hadoop.
> You'll get much better map/reduce support, higher performance, and
> lower cost.  GAE is not a box that you're stuck in; you might still
> run part of your application on GAE if it makes sense.  Just keep an
> eye on latency and communication costs.
>
> This isn't a scathing indictment of GAE so much as a realization that
> it's not a universal tool.  There are a lot of things that are easier
> to build with other tools... and a lot of things that are easier to
> build on app engine.  And some things that are best hybrids of GAE and
> something else.
>
> Jeff
>
>
>
>
>
>
>
>
>
> On Wed, Dec 28, 2011 at 5:26 AM, Yohan <[email protected]> wrote:
> > Hi Brandon,
>
> > Although i agree with you that the original dataset wasnt fully
> > optimized (that was over 2 years ago), i believe that i have a good
> > understanding of datatore vs SQL, caching etc. Im not building public
> > facing website im dealing with private apis and I am already
> > stretching memcache and custom built java cache to the limits.
>
> > I am also not talking about the reasons why im migrating out of GAE.
> > The points i highlighted were:
>
> > - no easy way to get your data out
> > - no cheap way to get your big data out
> > - bulk export in python doesn't handle binary/blob data
> > - remote api is unstable
> > - running database queries using cursors for long period of time is
> > unreliable (many times the cursor got reset for some reason or the
> > query would return a 0000000 cursor thus screwing 1 week of data
> > processing)
> > - it cost me an arm to delete my data
>
> > To answer other questions :
> > - of course i thought about migrating the remaining data to a new app
> > then alias from the old app to the new one. But it means interrupting
> > the service (disable datastore writes) and i cant afford that. Plus
> > the remaining data is still quite big.
> > - the multi indexes: everytime i changed the data structure i would
> > reprocess everything to conform it to the new schema. Im not using any
> > framework like objectify or jdo, im working with the raw api directly
> > (which is way more elegant)
> > - im not criticizing the platform i am criticizing the lack of tools
> > to export and the prohibitive cost of manipulating large data sets. I
> > actually love GAE, it is just not for this kind of dataset thats all.
>
> > @Brandon : If you have a way to delete 2 billions entities (whatever
> > their size) on the cheap please let me know.
>
> > On Dec 28, 8:48 pm, Leandro Rezende <[email protected]> wrote:
> >> u pay to write, pay to keep it stored... delete should be free.
>
> >> 2011/12/28 Brandon Wirtz <[email protected]>
>
> >> > Yes, ****
>
> >> > While the primary app I talk about is edge Cache, that’s because that’s
> >> > the thing that people can most benefit from that people don’t seem to be
> >> > using.****
>
> >> > ** **
>
> >> > As part of my SEO tools we have what is now a 60 TB database of Backlinks
> >> > and Crawler data about websites in the top 200k Alexa Sites.  ****
>
> >> > ** **
>
> >> > Why should Deleting be Cheaper? The Operation takes the same amount of
> >> > CPU, and after you do the delete you don’t have to pay for storage.****
>
> >> > ** **
>
> >> > I don’t do near as much in the Java Space but it doesn’t seem there 
> >> > should
> >> > be much difference between Python and Java.  I ported both the primary 
> >> > apps
> >> > to both languages to do comparative cost analysis, and there have been a
> >> > few things that we found were faster or cheaper with one or the other, 
> >> > as a
> >> > result in some case we deploy both and use different versioning so they 
> >> > can
> >> > both be live and attached to the same data.****
>
> >> > ** **
>
> >> > ** **
>
> >> > *From:* [email protected] [mailto:
> >> > [email protected]] *On Behalf Of *André Pankraz
> >> > *Sent:* Wednesday, December 28, 2011 12:06 AM
> >> > *To:* [email protected]
> >> > *Cc:* [email protected]
> >> > *Subject:* [google-appengine] Re: Cautionary Tale: Abusive price for data
> >> > migration and deletion****
>
> >> > ** **
>
> >> > Sry Brandon...he has a point - deleting data should be cheaper, even if
> >> > it's technically the same like writing.
> >> > Maybe he made some mistakes but you sometimes sound like a fanboy with 
> >> > GAE
> >> > stockholm syndrome. ;) See what I did here...annoying accusations.
> >> > You have very good experience with Python, Cache stuff, Edge cache etc.,
> >> > but do you really have experience  with multiple 100 GB datastore to talk
> >> > like this?
> >> > E.g.: I have also seen some answers from you (often very helpful) that 
> >> > are
> >> > just plain wrong in the Java environment.****
>
> >> > --
> >> > You received this message because you are subscribed to the Google Groups
> >> > "Google App Engine" group.
> >> > To view this discussion on the web visit
> >> >https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ.
> >> > To post to this group, send email to [email protected].
> >> > To unsubscribe from this group, send email to
> >> > [email protected].
> >> > For more options, visit this group at
> >> >http://groups.google.com/group/google-appengine?hl=en.****
>
> >> > --
> >> > You received this message because you are subscribed to the Google Groups
> >> > "Google App Engine" group.
> >> > To post to this group, send email to [email protected].
> >> > To unsubscribe from this group, send email to
> >> > [email protected].
> >> > For more options, visit this group at
> >> >http://groups.google.com/group/google-appengine?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Google App Engine" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/google-appengine?hl=en.
>
> --
> We are the 20%

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to