I am wondering about writing a Servlet that would form/multi-part
upload large files and cache them in memcache then use the
cron API to "trickle" persist them into the DS over time ...

could maybe even get adventurous and put a "filesystem"-like API
over the cache ...

lemme know if anyone would be interested in this ... if there is
enough interest I'll put something together and open source it.

These timeouts in the DS and at the request while "understandable"
are a total PITA coding around them complicates the application
logic considerably ...

honestly I am begining to wonder if EC2 with a tomcat and a MySQL
or Derby DB in it might be a better way to go if you actually want
to store reasonable amounts of data.

I realize that this is an EA environment, but my feedback would be
that if you are aiming to provide a scalable web app and datastore,
there is little point in having either if the hosted application
cannot
store reasonable amounts of data in the 1st place.


On Sep 12, 8:37 am, P64 <primo...@gmail.com> wrote:
> Exactly!
>
> I was hoping this update 
> (http://code.google.com/p/datanucleus-appengine/issues/detail?id=7
> ) would seriously improve bulk inserts. As it seems in practice you
> can now do roughly 2-3 times as many inserts in the same ammount of
> real and CPU time.
>
> However this is still poor compared to what we're used to with
> relational databases on a relatively poor hardware.
>
> At the moment I can do a batch input of up to 300 entities (with a
> couple of Integer and String properties) in a 30 second time window
> and it costs me around 18 seconds of CPU time.
>
> I have a roughly 1.5MB file which I have to download, parse it's
> 15.000 lines and insert them in database. I need no transactions in
> this case, all entities can be standalone, I don't mine the order in
> which they are written, could be parallel aswell as fas as I am
> concerned. As it seems now, I have to download this file, slice it in
> chunks of 300 lines, store each chunk in a database. Than I need to
> put 50 tasks in a queue, each taking 30 seconds to read a chunk from
> database, parse it in 300 seperate entities and store as such.
>
> Just database writes to do that would cost me well over 15 CPU
> minutes, not to mention the overhead caused by all the task spawning
> and so on.
>
> All that for an operation which litteraly takes seconds on my old box
> running relational DB, which I use for testing purposes.
>
> It's to complicated and it uses way to many resources to update 1000+
> entities - and there are lots of applications that need to update data
> from different sources (XML, SQL dumps, ...) on a daily basis.
>
> On 12 sep., 02:30, Larry Cable <larry.ca...@gmail.com> wrote:
>
>
>
> > So now, I am hitting Datastore timeouts and Request timeouts ...
>
> > I really really think you guys need to add a mechanism that allows
> > developers to simply do bulk uploads of data into their GAE
> > applications (from Java thank you).- Hide quoted text -
>
> - Show quoted text -
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to