Just a thought (and it would probably be expensive) but perhaps you
should do a two-phase export strategy:

1) Export data into Blobstore as very large blobs
2) Suck data out of the Blobstore

The export can run at Map/Reduce speeds... as fast as you want to pay
for.  Bulk downloads from the blobstore should be fast.  Unless each
of your entities are huge, fetching 30 at a time is an awfully small
number.

Jeff

On Thu, Dec 29, 2011 at 1:45 AM, Yohan <[email protected]> wrote:
> Hi Jon,
>
> *cheap* is relative, i wouldn't mind receiving a harddrive from Google
> with all my datastore in it for $X00 which would still have been
> cheaper in time, energy and money put into migrating out.
>
> I built the tools myself, simple java programs reading from the
> datastore by batches of 30 entities and dumping them to disk, saving
> the cursor and continuing from there. A few lines of code really using
> the java remote api. The issue lies in error management because the
> datastore will break at least a few times a day due to high latency
> and stuff (same issues you see directly within GAE but you experience
> it remotely). So you continuously have to restart the job (manually or
> not). That's where cursors are crucial since there is no way to
> iterate through the database in order. And if the cursor gets
> corrupted which happened to me 3 times in 5 weeks, you have to erase
> everything you've done and start from scratch. Very frustrating...
>
> On Dec 29, 1:25 pm, jon <[email protected]> wrote:
>> Yohan I agree that there should be an easy and cheap way to get your
>> data out. I think it's a little unfair that leaving GAE is made that
>> hard.
>>
>> How much did you spend on your custom data download tool? Would you
>> consider open sourcing it for other developers who are caught in the
>> same position? I'd hate spending weeks building a custom tool just to
>> get my data out.
>>
>> Thanks for sharing your experience.
>>
>> On Dec 29, 12:26 am, Yohan <[email protected]> wrote:
>>
>>
>>
>>
>>
>>
>>
>> > Hi Brandon,
>>
>> > Although i agree with you that the original dataset wasnt fully
>> > optimized (that was over 2 years ago), i believe that i have a good
>> > understanding of datatore vs SQL, caching etc. Im not building public
>> > facing website im dealing with private apis and I am already
>> > stretching memcache and custom built java cache to the limits.
>>
>> > I am also not talking about the reasons why im migrating out of GAE.
>> > The points i highlighted were:
>>
>> > - no easy way to get your data out
>> > - no cheap way to get your big data out
>> > - bulk export in python doesn't handle binary/blob data
>> > - remote api is unstable
>> > - running database queries using cursors for long period of time is
>> > unreliable (many times the cursor got reset for some reason or the
>> > query would return a 0000000 cursor thus screwing 1 week of data
>> > processing)
>> > - it cost me an arm to delete my data
>>
>> > To answer other questions :
>> > - of course i thought about migrating the remaining data to a new app
>> > then alias from the old app to the new one. But it means interrupting
>> > the service (disable datastore writes) and i cant afford that. Plus
>> > the remaining data is still quite big.
>> > - the multi indexes: everytime i changed the data structure i would
>> > reprocess everything to conform it to the new schema. Im not using any
>> > framework like objectify or jdo, im working with the raw api directly
>> > (which is way more elegant)
>> > - im not criticizing the platform i am criticizing the lack of tools
>> > to export and the prohibitive cost of manipulating large data sets. I
>> > actually love GAE, it is just not for this kind of dataset thats all.
>>
>> > @Brandon : If you have a way to delete 2 billions entities (whatever
>> > their size) on the cheap please let me know.
>>
>> > On Dec 28, 8:48 pm, Leandro Rezende <[email protected]> wrote:
>>
>> > > u pay to write, pay to keep it stored... delete should be free.
>>
>> > > 2011/12/28 Brandon Wirtz <[email protected]>
>>
>> > > > Yes, ****
>>
>> > > > While the primary app I talk about is edge Cache, that’s because that’s
>> > > > the thing that people can most benefit from that people don’t seem to 
>> > > > be
>> > > > using.****
>>
>> > > > ** **
>>
>> > > > As part of my SEO tools we have what is now a 60 TB database of 
>> > > > Backlinks
>> > > > and Crawler data about websites in the top 200k Alexa Sites.  ****
>>
>> > > > ** **
>>
>> > > > Why should Deleting be Cheaper? The Operation takes the same amount of
>> > > > CPU, and after you do the delete you don’t have to pay for storage.****
>>
>> > > > ** **
>>
>> > > > I don’t do near as much in the Java Space but it doesn’t seem there 
>> > > > should
>> > > > be much difference between Python and Java.  I ported both the primary 
>> > > > apps
>> > > > to both languages to do comparative cost analysis, and there have been 
>> > > > a
>> > > > few things that we found were faster or cheaper with one or the other, 
>> > > > as a
>> > > > result in some case we deploy both and use different versioning so 
>> > > > they can
>> > > > both be live and attached to the same data.****
>>
>> > > > ** **
>>
>> > > > ** **
>>
>> > > > *From:* [email protected] [mailto:
>> > > > [email protected]] *On Behalf Of *André Pankraz
>> > > > *Sent:* Wednesday, December 28, 2011 12:06 AM
>> > > > *To:* [email protected]
>> > > > *Cc:* [email protected]
>> > > > *Subject:* [google-appengine] Re: Cautionary Tale: Abusive price for 
>> > > > data
>> > > > migration and deletion****
>>
>> > > > ** **
>>
>> > > > Sry Brandon...he has a point - deleting data should be cheaper, even if
>> > > > it's technically the same like writing.
>> > > > Maybe he made some mistakes but you sometimes sound like a fanboy with 
>> > > > GAE
>> > > > stockholm syndrome. ;) See what I did here...annoying accusations.
>> > > > You have very good experience with Python, Cache stuff, Edge cache 
>> > > > etc.,
>> > > > but do you really have experience  with multiple 100 GB datastore to 
>> > > > talk
>> > > > like this?
>> > > > E.g.: I have also seen some answers from you (often very helpful) that 
>> > > > are
>> > > > just plain wrong in the Java environment.****
>>
>> > > > --
>> > > > You received this message because you are subscribed to the Google 
>> > > > Groups
>> > > > "Google App Engine" group.
>> > > > To view this discussion on the web visit
>> > > >https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ.
>> > > > To post to this group, send email to [email protected].
>> > > > To unsubscribe from this group, send email to
>> > > > [email protected].
>> > > > For more options, visit this group at
>> > > >http://groups.google.com/group/google-appengine?hl=en.****
>>
>> > > > --
>> > > > You received this message because you are subscribed to the Google 
>> > > > Groups
>> > > > "Google App Engine" group.
>> > > > To post to this group, send email to [email protected].
>> > > > To unsubscribe from this group, send email to
>> > > > [email protected].
>> > > > For more options, visit this group at
>> > > >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>



-- 
We are the 20%

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to