[appengine-java] Re: Bulk writes to datastore

2009-09-15 Thread Richard

Hi Larry

 I am wondering about writing a Servlet that would form/multi-part
 upload large files and cache them in memcache then use the
 cron API to trickle persist them into the DS over time ...

I've been thinking about using something like this as well.  I think
you could likely cache the upload to the store because the limit here
seems to be mainly the amount of entities, not the size of one entity
(below 1mb).  I have e.g. 100/200k worth of data that I upload, but
because it's represented as a couple hundred entities it chokes.  I
could just upload the 93k and fire off a task (or cron job) that would
parse and insert the data offline.

At the very least, I plan to use the low-level api more.  The (very
useful) performance testing app http://gaejava.appspot.com/ shows
consistently higher CPU usage from JDO.  If this ever improves, that
app should show it.  Until then, low-level looks good.

Regards,
Richard
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-12 Thread P64

Exactly!

I was hoping this update ( 
http://code.google.com/p/datanucleus-appengine/issues/detail?id=7
) would seriously improve bulk inserts. As it seems in practice you
can now do roughly 2-3 times as many inserts in the same ammount of
real and CPU time.

However this is still poor compared to what we're used to with
relational databases on a relatively poor hardware.

At the moment I can do a batch input of up to 300 entities (with a
couple of Integer and String properties) in a 30 second time window
and it costs me around 18 seconds of CPU time.

I have a roughly 1.5MB file which I have to download, parse it's
15.000 lines and insert them in database. I need no transactions in
this case, all entities can be standalone, I don't mine the order in
which they are written, could be parallel aswell as fas as I am
concerned. As it seems now, I have to download this file, slice it in
chunks of 300 lines, store each chunk in a database. Than I need to
put 50 tasks in a queue, each taking 30 seconds to read a chunk from
database, parse it in 300 seperate entities and store as such.

Just database writes to do that would cost me well over 15 CPU
minutes, not to mention the overhead caused by all the task spawning
and so on.

All that for an operation which litteraly takes seconds on my old box
running relational DB, which I use for testing purposes.

It's to complicated and it uses way to many resources to update 1000+
entities - and there are lots of applications that need to update data
from different sources (XML, SQL dumps, ...) on a daily basis.

On 12 sep., 02:30, Larry Cable larry.ca...@gmail.com wrote:
 So now, I am hitting Datastore timeouts and Request timeouts ...

 I really really think you guys need to add a mechanism that allows
 developers to simply do bulk uploads of data into their GAE
 applications (from Java thank you).
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-12 Thread Larry Cable

I am wondering about writing a Servlet that would form/multi-part
upload large files and cache them in memcache then use the
cron API to trickle persist them into the DS over time ...

could maybe even get adventurous and put a filesystem-like API
over the cache ...

lemme know if anyone would be interested in this ... if there is
enough interest I'll put something together and open source it.

These timeouts in the DS and at the request while understandable
are a total PITA coding around them complicates the application
logic considerably ...

honestly I am begining to wonder if EC2 with a tomcat and a MySQL
or Derby DB in it might be a better way to go if you actually want
to store reasonable amounts of data.

I realize that this is an EA environment, but my feedback would be
that if you are aiming to provide a scalable web app and datastore,
there is little point in having either if the hosted application
cannot
store reasonable amounts of data in the 1st place.


On Sep 12, 8:37 am, P64 primo...@gmail.com wrote:
 Exactly!

 I was hoping this update 
 (http://code.google.com/p/datanucleus-appengine/issues/detail?id=7
 ) would seriously improve bulk inserts. As it seems in practice you
 can now do roughly 2-3 times as many inserts in the same ammount of
 real and CPU time.

 However this is still poor compared to what we're used to with
 relational databases on a relatively poor hardware.

 At the moment I can do a batch input of up to 300 entities (with a
 couple of Integer and String properties) in a 30 second time window
 and it costs me around 18 seconds of CPU time.

 I have a roughly 1.5MB file which I have to download, parse it's
 15.000 lines and insert them in database. I need no transactions in
 this case, all entities can be standalone, I don't mine the order in
 which they are written, could be parallel aswell as fas as I am
 concerned. As it seems now, I have to download this file, slice it in
 chunks of 300 lines, store each chunk in a database. Than I need to
 put 50 tasks in a queue, each taking 30 seconds to read a chunk from
 database, parse it in 300 seperate entities and store as such.

 Just database writes to do that would cost me well over 15 CPU
 minutes, not to mention the overhead caused by all the task spawning
 and so on.

 All that for an operation which litteraly takes seconds on my old box
 running relational DB, which I use for testing purposes.

 It's to complicated and it uses way to many resources to update 1000+
 entities - and there are lots of applications that need to update data
 from different sources (XML, SQL dumps, ...) on a daily basis.

 On 12 sep., 02:30, Larry Cable larry.ca...@gmail.com wrote:



  So now, I am hitting Datastore timeouts and Request timeouts ...

  I really really think you guys need to add a mechanism that allows
  developers to simply do bulk uploads of data into their GAE
  applications (from Java thank you).- Hide quoted text -

 - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-11 Thread Larry Cable

I tried doing a bulk load with the JDO makePersistentAll(..) call
yesterday ...

by default what I did was created a List of size 2048, filled it to
capacity and then called makePersistentAll() ... I got an
IllegalArgumentException out of that call stating that you could
only persist at most 500 objects per call ...

I was unable to retest this, because despite making a change to
the capacity of the the List to 500 and re-deploying the redeployment
did not seem to take effect ...

will keep trying ...

On Sep 4, 5:24 pm, Jason (Google) apija...@google.com wrote:
 Batch puts are supported, yes, and as of yesterday's release, calling
 makePersistentAll (JDO) and the equivalent JPA call will take advantage of
 this support (previously, you had to use the low-level API).

 Two quick notes:

 1) All of the entities that you're persisting should be in separate entity
 groups since two entities in the same entity group can't be written to
 consecutively, and you will see datastore timeout exceptions if many
 simultaneous write requests come in for the same entity or entity group.
 2) Batch puts do not operate in a transaction. This means that some writes
 may succeed but others may not, so if you need the ability to rollback,
 you'll need transactions.

 - Jason

 Let me know if you have any more questions on this.

 - Jason



 On Thu, Sep 3, 2009 at 7:24 PM, Nicholas Albion nalb...@gmail.com wrote:

  Is it possible to overcome the datastore's 10 writes/second limit by
  batching them?

  I've got a table containing just over one million records (in CSV
  format).  Does a batched write (of around 1MB of data and, say 1000
  records) count as one write, or 1000 writes?- Hide quoted text -

 - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-11 Thread Larry Cable

So now, I am hitting Datastore timeouts and Request timeouts ...

I really really think you guys need to add a mechanism that allows
developers to simply do bulk uploads of data into their GAE
applications (from Java thank you).

:)

On Sep 11, 9:06 am, Larry Cable larry.ca...@gmail.com wrote:
 I tried doing a bulk load with the JDO makePersistentAll(..) call
 yesterday ...

 by default what I did was created a List of size 2048, filled it to
 capacity and then called makePersistentAll() ... I got an
 IllegalArgumentException out of that call stating that you could
 only persist at most 500 objects per call ...

 I was unable to retest this, because despite making a change to
 the capacity of the the List to 500 and re-deploying the redeployment
 did not seem to take effect ...

 will keep trying ...

 On Sep 4, 5:24 pm, Jason (Google) apija...@google.com wrote:



  Batch puts are supported, yes, and as of yesterday's release, calling
  makePersistentAll (JDO) and the equivalent JPA call will take advantage of
  this support (previously, you had to use the low-level API).

  Two quick notes:

  1) All of the entities that you're persisting should be in separate entity
  groups since two entities in the same entity group can't be written to
  consecutively, and you will see datastore timeout exceptions if many
  simultaneous write requests come in for the same entity or entity group.
  2) Batch puts do not operate in a transaction. This means that some writes
  may succeed but others may not, so if you need the ability to rollback,
  you'll need transactions.

  - Jason

  Let me know if you have any more questions on this.

  - Jason

  On Thu, Sep 3, 2009 at 7:24 PM, Nicholas Albion nalb...@gmail.com wrote:

   Is it possible to overcome the datastore's 10 writes/second limit by
   batching them?

   I've got a table containing just over one million records (in CSV
   format).  Does a batched write (of around 1MB of data and, say 1000
   records) count as one write, or 1000 writes?- Hide quoted text -

  - Show quoted text -- Hide quoted text -

 - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-08 Thread Jason (Google)
Yes. If you need to be able to rollback in case one or more entities don't
get written, you'll need to use transactions. If you use transactions, your
entities must belong to the same entity group or else an exception will be
thrown. You'll get better performance if you do this outside of a
transaction since all entities can be written in parallel but you'll lose
the ability to roll back in case of an individual failure.

- Jason

On Sat, Sep 5, 2009 at 7:18 AM, Vince Bonfanti vbonfa...@gmail.com wrote:


 Your two quick notes seem to be contradictory. In order to use
 transactions, don't all of the entities have to be in the same entity
 group?

 Vince

 On Fri, Sep 4, 2009 at 8:24 PM, Jason (Google)apija...@google.com wrote:
  Batch puts are supported, yes, and as of yesterday's release, calling
  makePersistentAll (JDO) and the equivalent JPA call will take advantage
 of
  this support (previously, you had to use the low-level API).
 
  Two quick notes:
 
  1) All of the entities that you're persisting should be in separate
 entity
  groups since two entities in the same entity group can't be written to
  consecutively, and you will see datastore timeout exceptions if many
  simultaneous write requests come in for the same entity or entity group.
  2) Batch puts do not operate in a transaction. This means that some
 writes
  may succeed but others may not, so if you need the ability to rollback,
  you'll need transactions.
 
  - Jason
 
  Let me know if you have any more questions on this.
 
  - Jason
 

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-05 Thread Vince Bonfanti

Your two quick notes seem to be contradictory. In order to use
transactions, don't all of the entities have to be in the same entity
group?

Vince

On Fri, Sep 4, 2009 at 8:24 PM, Jason (Google)apija...@google.com wrote:
 Batch puts are supported, yes, and as of yesterday's release, calling
 makePersistentAll (JDO) and the equivalent JPA call will take advantage of
 this support (previously, you had to use the low-level API).

 Two quick notes:

 1) All of the entities that you're persisting should be in separate entity
 groups since two entities in the same entity group can't be written to
 consecutively, and you will see datastore timeout exceptions if many
 simultaneous write requests come in for the same entity or entity group.
 2) Batch puts do not operate in a transaction. This means that some writes
 may succeed but others may not, so if you need the ability to rollback,
 you'll need transactions.

 - Jason

 Let me know if you have any more questions on this.

 - Jason


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-04 Thread Jason (Google)
Batch puts are supported, yes, and as of yesterday's release, calling
makePersistentAll (JDO) and the equivalent JPA call will take advantage of
this support (previously, you had to use the low-level API).

Two quick notes:

1) All of the entities that you're persisting should be in separate entity
groups since two entities in the same entity group can't be written to
consecutively, and you will see datastore timeout exceptions if many
simultaneous write requests come in for the same entity or entity group.
2) Batch puts do not operate in a transaction. This means that some writes
may succeed but others may not, so if you need the ability to rollback,
you'll need transactions.

- Jason

Let me know if you have any more questions on this.

- Jason

On Thu, Sep 3, 2009 at 7:24 PM, Nicholas Albion nalb...@gmail.com wrote:


 Is it possible to overcome the datastore's 10 writes/second limit by
 batching them?

 I've got a table containing just over one million records (in CSV
 format).  Does a batched write (of around 1MB of data and, say 1000
 records) count as one write, or 1000 writes?
 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine for Java group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---