Hi Raymond,

Don't misunderstand me. GAE is a great tool, i seriously love it and
advocate it everywhere I go. But since my last experience I would
recommend an hybrid solution instead of full steam GAE. At least for
data gathering and processing.

The cost structure is quite clear :
$1 / 1 million writes
Each entity write / delete = at least 2writes (entity + key) + N
writes for the indexes (+ maybe replication i dont know if that's
counted)
So if you have 100 millions entities that's an easy $200-300 to delete
it. And believe me it is really easy to generate that many entities
when your app processes 1500 req/s, even by aggregating you are still
limited to 1MB / entity member (but i don't like to play near the
bytes limits due to potential serialization overhead so i won't store
more than 75-80% of 1MB / entity member).

I believe that the datastore (or even GAE memcache) doesn't offer a
simple flush mechanism because the entire platform is shared among
multiple (all?) the apps and the way data is stored doesn't allow for
a simple flush. (counting is a different matter). I just hope that the
GAE team will read my article and maybe lower the price of deletion a
bit.

On Dec 29, 4:13 pm, Raymond <[email protected]>
wrote:
> Dear Yohan,
>
> On my side I thank you for sharing your experience, I am beginning
> with GAE and know that whatever the time I will put on this project I
> will be making beginner mistakes and this kind of info is precious.
> I have now a limited experience with GAE and have to compare it with
> what I know and in some sectors GAE look very bad, for example I can't
> imagine Oracle, DB2, Informix, etc, ...MsSQL, etc having any
> commercial success if they would not have implemented rock solid
> solutions to import and export data, backup, build and drop tables and
> databases and of course calculate precisely the data space required to
> build a data structure, in some cases down to the byte.
> Although I understand the very different nature of GAE compared to
> this traditional DB engines, I think that any professional developer,
> IT manager, project manager, or person responsible for budget would
> feel very uncomfortable building a system without a firm grip on it's
> costs or a reasonable solution to modify an initial implementation or
> migrate away from it. Also the fact that part of the GAE tools are
> simply not reliable enough to be able to plan effort and time required
> to do something is an other big minus for this solution.
>
> Although DB's are not my main competence, my very first paid job
> 20+years ago was to migrate a critical database to a new structure on
> a new machine (HP 9000 unix), using a long forgotten database engine,
> the first attempt using SQL took 1 week to migrate, the second using
> low level C calls took months to develop and migrated in the required
> 3.5 hours, but the important thing to note is that It never crossed my
> mind to question the reliability of the machine, the database or the C
> calls I was making to the DB, it just worked, the Server could be
> locked for minutes swapping to disk because of lack of memory or
> overload, but it never failed once and repeated the exercise time and
> time again, reliably and in a predictable timeframe.
>
> All this said there are advantages to GAE that are worth fighting with
> it's limitations, I have not yet found anything else that is so
> immediately and massively scalable and at the same time does not
> require me to manage the software and hardware, this is invaluable,
> and although I know that I could have a easier job moving to MySQL, I
> just don't want to manage an OS and a DB engine, I don't have the
> time, I have done it and don't think that's where I am going to earn
> my bacon.
>
> I will always envy some of the people answering your message for the
> depth of knowledge they have of this platform and the fact that they
> always have the right solution and right answer to everything, it must
> be great to never make mistakes.
>
> -R
>
> On Dec 29, 6:25 am, jon <[email protected]> wrote:
>
>
>
>
>
>
>
> > Yohan I agree that there should be an easy and cheap way to get your
> > data out. I think it's a little unfair that leaving GAE is made that
> > hard.
>
> > How much did you spend on your custom data download tool? Would you
> > consider open sourcing it for other developers who are caught in the
> > same position? I'd hate spending weeks building a custom tool just to
> > get my data out.
>
> > Thanks for sharing your experience.
>
> > On Dec 29, 12:26 am, Yohan <[email protected]> wrote:
>
> > > Hi Brandon,
>
> > > Although i agree with you that the original dataset wasnt fully
> > > optimized (that was over 2 years ago), i believe that i have a good
> > > understanding of datatore vs SQL, caching etc. Im not building public
> > > facing website im dealing with private apis and I am already
> > > stretching memcache and custom built java cache to the limits.
>
> > > I am also not talking about the reasons why im migrating out of GAE.
> > > The points i highlighted were:
>
> > > - no easy way to get your data out
> > > - no cheap way to get your big data out
> > > - bulk export in python doesn't handle binary/blob data
> > > - remote api is unstable
> > > - running database queries using cursors for long period of time is
> > > unreliable (many times the cursor got reset for some reason or the
> > > query would return a 0000000 cursor thus screwing 1 week of data
> > > processing)
> > > - it cost me an arm to delete my data
>
> > > To answer other questions :
> > > - of course i thought about migrating the remaining data to a new app
> > > then alias from the old app to the new one. But it means interrupting
> > > the service (disable datastore writes) and i cant afford that. Plus
> > > the remaining data is still quite big.
> > > - the multi indexes: everytime i changed the data structure i would
> > > reprocess everything to conform it to the new schema. Im not using any
> > > framework like objectify or jdo, im working with the raw api directly
> > > (which is way more elegant)
> > > - im not criticizing the platform i am criticizing the lack of tools
> > > to export and the prohibitive cost of manipulating large data sets. I
> > > actually love GAE, it is just not for this kind of dataset thats all.
>
> > > @Brandon : If you have a way to delete 2 billions entities (whatever
> > > their size) on the cheap please let me know.
>
> > > On Dec 28, 8:48 pm, Leandro Rezende <[email protected]> wrote:
>
> > > > u pay to write, pay to keep it stored... delete should be free.
>
> > > > 2011/12/28 Brandon Wirtz <[email protected]>
>
> > > > > Yes, ****
>
> > > > > While the primary app I talk about is edge Cache, that’s because 
> > > > > that’s
> > > > > the thing that people can most benefit from that people don’t seem to 
> > > > > be
> > > > > using.****
>
> > > > > ** **
>
> > > > > As part of my SEO tools we have what is now a 60 TB database of 
> > > > > Backlinks
> > > > > and Crawler data about websites in the top 200k Alexa Sites.  ****
>
> > > > > ** **
>
> > > > > Why should Deleting be Cheaper? The Operation takes the same amount of
> > > > > CPU, and after you do the delete you don’t have to pay for 
> > > > > storage.****
>
> > > > > ** **
>
> > > > > I don’t do near as much in the Java Space but it doesn’t seem there 
> > > > > should
> > > > > be much difference between Python and Java.  I ported both the 
> > > > > primary apps
> > > > > to both languages to do comparative cost analysis, and there have 
> > > > > been a
> > > > > few things that we found were faster or cheaper with one or the 
> > > > > other, as a
> > > > > result in some case we deploy both and use different versioning so 
> > > > > they can
> > > > > both be live and attached to the same data.****
>
> > > > > ** **
>
> > > > > ** **
>
> > > > > *From:* [email protected] [mailto:
> > > > > [email protected]] *On Behalf Of *André Pankraz
> > > > > *Sent:* Wednesday, December 28, 2011 12:06 AM
> > > > > *To:* [email protected]
> > > > > *Cc:* [email protected]
> > > > > *Subject:* [google-appengine] Re: Cautionary Tale: Abusive price for 
> > > > > data
> > > > > migration and deletion****
>
> > > > > ** **
>
> > > > > Sry Brandon...he has a point - deleting data should be cheaper, even 
> > > > > if
> > > > > it's technically the same like writing.
> > > > > Maybe he made some mistakes but you sometimes sound like a fanboy 
> > > > > with GAE
> > > > > stockholm syndrome. ;) See what I did here...annoying accusations.
> > > > > You have very good experience with Python, Cache stuff, Edge cache 
> > > > > etc.,
> > > > > but do you really have experience  with multiple 100 GB datastore to 
> > > > > talk
> > > > > like this?
> > > > > E.g.: I have also seen some answers from you (often very helpful) 
> > > > > that are
> > > > > just plain wrong in the Java environment.****
>
> > > > > --
> > > > > You received this message because you are subscribed to the Google 
> > > > > Groups
> > > > > "Google App Engine" group.
> > > > > To view this discussion on the web visit
> > > > >https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ.
> > > > > To post to this group, send email to 
> > > > > [email protected].
> > > > > To unsubscribe from this group, send email to
> > > > > [email protected].
> > > > > For more options, visit this group at
> > > > >http://groups.google.com/group/google-appengine?hl=en.****
>
> > > > > --
> > > > > You received this message because you are subscribed to the Google 
> > > > > Groups
> > > > > "Google App Engine" group.
> > > > > To post to this group, send email to 
> > > > > [email protected].
> > > > > To unsubscribe from this group, send email to
> > > > > [email protected].
> > > > > For more options, visit this group at
> > > > >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to