It looks like you've discovered the hard way something that is not
wholly obvious at first:  GAE is not good for Big Data.

The HRD is super-cool and perfect for building reliable web
applications.  But it is way too slow and expensive for large-scale
data processing.  And the uber-reliability is usually pointless - when
dealing with massive data volumes, your collection system is likely
somewhat lossy in the first place.  Losing a few bits probably won't
hurt you, and "synchronously replicated to more than three data
centers" is massive overkill.

You probably have the right idea moving to another platform.  Use the
right tool for the right job; maybe something like MongoDB or Hadoop.
You'll get much better map/reduce support, higher performance, and
lower cost.  GAE is not a box that you're stuck in; you might still
run part of your application on GAE if it makes sense.  Just keep an
eye on latency and communication costs.

This isn't a scathing indictment of GAE so much as a realization that
it's not a universal tool.  There are a lot of things that are easier
to build with other tools... and a lot of things that are easier to
build on app engine.  And some things that are best hybrids of GAE and
something else.

Jeff

On Wed, Dec 28, 2011 at 5:26 AM, Yohan <[email protected]> wrote:
> Hi Brandon,
>
> Although i agree with you that the original dataset wasnt fully
> optimized (that was over 2 years ago), i believe that i have a good
> understanding of datatore vs SQL, caching etc. Im not building public
> facing website im dealing with private apis and I am already
> stretching memcache and custom built java cache to the limits.
>
> I am also not talking about the reasons why im migrating out of GAE.
> The points i highlighted were:
>
> - no easy way to get your data out
> - no cheap way to get your big data out
> - bulk export in python doesn't handle binary/blob data
> - remote api is unstable
> - running database queries using cursors for long period of time is
> unreliable (many times the cursor got reset for some reason or the
> query would return a 0000000 cursor thus screwing 1 week of data
> processing)
> - it cost me an arm to delete my data
>
> To answer other questions :
> - of course i thought about migrating the remaining data to a new app
> then alias from the old app to the new one. But it means interrupting
> the service (disable datastore writes) and i cant afford that. Plus
> the remaining data is still quite big.
> - the multi indexes: everytime i changed the data structure i would
> reprocess everything to conform it to the new schema. Im not using any
> framework like objectify or jdo, im working with the raw api directly
> (which is way more elegant)
> - im not criticizing the platform i am criticizing the lack of tools
> to export and the prohibitive cost of manipulating large data sets. I
> actually love GAE, it is just not for this kind of dataset thats all.
>
> @Brandon : If you have a way to delete 2 billions entities (whatever
> their size) on the cheap please let me know.
>
>
> On Dec 28, 8:48 pm, Leandro Rezende <[email protected]> wrote:
>> u pay to write, pay to keep it stored... delete should be free.
>>
>> 2011/12/28 Brandon Wirtz <[email protected]>
>>
>>
>>
>>
>>
>>
>>
>> > Yes, ****
>>
>> > While the primary app I talk about is edge Cache, that’s because that’s
>> > the thing that people can most benefit from that people don’t seem to be
>> > using.****
>>
>> > ** **
>>
>> > As part of my SEO tools we have what is now a 60 TB database of Backlinks
>> > and Crawler data about websites in the top 200k Alexa Sites.  ****
>>
>> > ** **
>>
>> > Why should Deleting be Cheaper? The Operation takes the same amount of
>> > CPU, and after you do the delete you don’t have to pay for storage.****
>>
>> > ** **
>>
>> > I don’t do near as much in the Java Space but it doesn’t seem there should
>> > be much difference between Python and Java.  I ported both the primary apps
>> > to both languages to do comparative cost analysis, and there have been a
>> > few things that we found were faster or cheaper with one or the other, as a
>> > result in some case we deploy both and use different versioning so they can
>> > both be live and attached to the same data.****
>>
>> > ** **
>>
>> > ** **
>>
>> > *From:* [email protected] [mailto:
>> > [email protected]] *On Behalf Of *André Pankraz
>> > *Sent:* Wednesday, December 28, 2011 12:06 AM
>> > *To:* [email protected]
>> > *Cc:* [email protected]
>> > *Subject:* [google-appengine] Re: Cautionary Tale: Abusive price for data
>> > migration and deletion****
>>
>> > ** **
>>
>> > Sry Brandon...he has a point - deleting data should be cheaper, even if
>> > it's technically the same like writing.
>> > Maybe he made some mistakes but you sometimes sound like a fanboy with GAE
>> > stockholm syndrome. ;) See what I did here...annoying accusations.
>> > You have very good experience with Python, Cache stuff, Edge cache etc.,
>> > but do you really have experience  with multiple 100 GB datastore to talk
>> > like this?
>> > E.g.: I have also seen some answers from you (often very helpful) that are
>> > just plain wrong in the Java environment.****
>>
>> > --
>> > You received this message because you are subscribed to the Google Groups
>> > "Google App Engine" group.
>> > To view this discussion on the web visit
>> >https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ.
>> > To post to this group, send email to [email protected].
>> > To unsubscribe from this group, send email to
>> > [email protected].
>> > For more options, visit this group at
>> >http://groups.google.com/group/google-appengine?hl=en.****
>>
>> > --
>> > You received this message because you are subscribed to the Google Groups
>> > "Google App Engine" group.
>> > To post to this group, send email to [email protected].
>> > To unsubscribe from this group, send email to
>> > [email protected].
>> > For more options, visit this group at
>> >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>



-- 
We are the 20%

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to