[appengine-java] Re: Bulk writes to datastore
Hi Larry I am wondering about writing a Servlet that would form/multi-part upload large files and cache them in memcache then use the cron API to trickle persist them into the DS over time ... I've been thinking about using something like this as well. I think you could likely cache the upload to the store because the limit here seems to be mainly the amount of entities, not the size of one entity (below 1mb). I have e.g. 100/200k worth of data that I upload, but because it's represented as a couple hundred entities it chokes. I could just upload the 93k and fire off a task (or cron job) that would parse and insert the data offline. At the very least, I plan to use the low-level api more. The (very useful) performance testing app http://gaejava.appspot.com/ shows consistently higher CPU usage from JDO. If this ever improves, that app should show it. Until then, low-level looks good. Regards, Richard --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en -~--~~~~--~~--~--~---
[appengine-java] Re: Bulk writes to datastore
Exactly! I was hoping this update ( http://code.google.com/p/datanucleus-appengine/issues/detail?id=7 ) would seriously improve bulk inserts. As it seems in practice you can now do roughly 2-3 times as many inserts in the same ammount of real and CPU time. However this is still poor compared to what we're used to with relational databases on a relatively poor hardware. At the moment I can do a batch input of up to 300 entities (with a couple of Integer and String properties) in a 30 second time window and it costs me around 18 seconds of CPU time. I have a roughly 1.5MB file which I have to download, parse it's 15.000 lines and insert them in database. I need no transactions in this case, all entities can be standalone, I don't mine the order in which they are written, could be parallel aswell as fas as I am concerned. As it seems now, I have to download this file, slice it in chunks of 300 lines, store each chunk in a database. Than I need to put 50 tasks in a queue, each taking 30 seconds to read a chunk from database, parse it in 300 seperate entities and store as such. Just database writes to do that would cost me well over 15 CPU minutes, not to mention the overhead caused by all the task spawning and so on. All that for an operation which litteraly takes seconds on my old box running relational DB, which I use for testing purposes. It's to complicated and it uses way to many resources to update 1000+ entities - and there are lots of applications that need to update data from different sources (XML, SQL dumps, ...) on a daily basis. On 12 sep., 02:30, Larry Cable larry.ca...@gmail.com wrote: So now, I am hitting Datastore timeouts and Request timeouts ... I really really think you guys need to add a mechanism that allows developers to simply do bulk uploads of data into their GAE applications (from Java thank you). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en -~--~~~~--~~--~--~---
[appengine-java] Re: Bulk writes to datastore
I am wondering about writing a Servlet that would form/multi-part upload large files and cache them in memcache then use the cron API to trickle persist them into the DS over time ... could maybe even get adventurous and put a filesystem-like API over the cache ... lemme know if anyone would be interested in this ... if there is enough interest I'll put something together and open source it. These timeouts in the DS and at the request while understandable are a total PITA coding around them complicates the application logic considerably ... honestly I am begining to wonder if EC2 with a tomcat and a MySQL or Derby DB in it might be a better way to go if you actually want to store reasonable amounts of data. I realize that this is an EA environment, but my feedback would be that if you are aiming to provide a scalable web app and datastore, there is little point in having either if the hosted application cannot store reasonable amounts of data in the 1st place. On Sep 12, 8:37 am, P64 primo...@gmail.com wrote: Exactly! I was hoping this update (http://code.google.com/p/datanucleus-appengine/issues/detail?id=7 ) would seriously improve bulk inserts. As it seems in practice you can now do roughly 2-3 times as many inserts in the same ammount of real and CPU time. However this is still poor compared to what we're used to with relational databases on a relatively poor hardware. At the moment I can do a batch input of up to 300 entities (with a couple of Integer and String properties) in a 30 second time window and it costs me around 18 seconds of CPU time. I have a roughly 1.5MB file which I have to download, parse it's 15.000 lines and insert them in database. I need no transactions in this case, all entities can be standalone, I don't mine the order in which they are written, could be parallel aswell as fas as I am concerned. As it seems now, I have to download this file, slice it in chunks of 300 lines, store each chunk in a database. Than I need to put 50 tasks in a queue, each taking 30 seconds to read a chunk from database, parse it in 300 seperate entities and store as such. Just database writes to do that would cost me well over 15 CPU minutes, not to mention the overhead caused by all the task spawning and so on. All that for an operation which litteraly takes seconds on my old box running relational DB, which I use for testing purposes. It's to complicated and it uses way to many resources to update 1000+ entities - and there are lots of applications that need to update data from different sources (XML, SQL dumps, ...) on a daily basis. On 12 sep., 02:30, Larry Cable larry.ca...@gmail.com wrote: So now, I am hitting Datastore timeouts and Request timeouts ... I really really think you guys need to add a mechanism that allows developers to simply do bulk uploads of data into their GAE applications (from Java thank you).- Hide quoted text - - Show quoted text - --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en -~--~~~~--~~--~--~---
[appengine-java] Re: Bulk writes to datastore
I tried doing a bulk load with the JDO makePersistentAll(..) call yesterday ... by default what I did was created a List of size 2048, filled it to capacity and then called makePersistentAll() ... I got an IllegalArgumentException out of that call stating that you could only persist at most 500 objects per call ... I was unable to retest this, because despite making a change to the capacity of the the List to 500 and re-deploying the redeployment did not seem to take effect ... will keep trying ... On Sep 4, 5:24 pm, Jason (Google) apija...@google.com wrote: Batch puts are supported, yes, and as of yesterday's release, calling makePersistentAll (JDO) and the equivalent JPA call will take advantage of this support (previously, you had to use the low-level API). Two quick notes: 1) All of the entities that you're persisting should be in separate entity groups since two entities in the same entity group can't be written to consecutively, and you will see datastore timeout exceptions if many simultaneous write requests come in for the same entity or entity group. 2) Batch puts do not operate in a transaction. This means that some writes may succeed but others may not, so if you need the ability to rollback, you'll need transactions. - Jason Let me know if you have any more questions on this. - Jason On Thu, Sep 3, 2009 at 7:24 PM, Nicholas Albion nalb...@gmail.com wrote: Is it possible to overcome the datastore's 10 writes/second limit by batching them? I've got a table containing just over one million records (in CSV format). Does a batched write (of around 1MB of data and, say 1000 records) count as one write, or 1000 writes?- Hide quoted text - - Show quoted text - --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en -~--~~~~--~~--~--~---
[appengine-java] Re: Bulk writes to datastore
So now, I am hitting Datastore timeouts and Request timeouts ... I really really think you guys need to add a mechanism that allows developers to simply do bulk uploads of data into their GAE applications (from Java thank you). :) On Sep 11, 9:06 am, Larry Cable larry.ca...@gmail.com wrote: I tried doing a bulk load with the JDO makePersistentAll(..) call yesterday ... by default what I did was created a List of size 2048, filled it to capacity and then called makePersistentAll() ... I got an IllegalArgumentException out of that call stating that you could only persist at most 500 objects per call ... I was unable to retest this, because despite making a change to the capacity of the the List to 500 and re-deploying the redeployment did not seem to take effect ... will keep trying ... On Sep 4, 5:24 pm, Jason (Google) apija...@google.com wrote: Batch puts are supported, yes, and as of yesterday's release, calling makePersistentAll (JDO) and the equivalent JPA call will take advantage of this support (previously, you had to use the low-level API). Two quick notes: 1) All of the entities that you're persisting should be in separate entity groups since two entities in the same entity group can't be written to consecutively, and you will see datastore timeout exceptions if many simultaneous write requests come in for the same entity or entity group. 2) Batch puts do not operate in a transaction. This means that some writes may succeed but others may not, so if you need the ability to rollback, you'll need transactions. - Jason Let me know if you have any more questions on this. - Jason On Thu, Sep 3, 2009 at 7:24 PM, Nicholas Albion nalb...@gmail.com wrote: Is it possible to overcome the datastore's 10 writes/second limit by batching them? I've got a table containing just over one million records (in CSV format). Does a batched write (of around 1MB of data and, say 1000 records) count as one write, or 1000 writes?- Hide quoted text - - Show quoted text -- Hide quoted text - - Show quoted text - --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en -~--~~~~--~~--~--~---
[appengine-java] Re: Bulk writes to datastore
Yes. If you need to be able to rollback in case one or more entities don't get written, you'll need to use transactions. If you use transactions, your entities must belong to the same entity group or else an exception will be thrown. You'll get better performance if you do this outside of a transaction since all entities can be written in parallel but you'll lose the ability to roll back in case of an individual failure. - Jason On Sat, Sep 5, 2009 at 7:18 AM, Vince Bonfanti vbonfa...@gmail.com wrote: Your two quick notes seem to be contradictory. In order to use transactions, don't all of the entities have to be in the same entity group? Vince On Fri, Sep 4, 2009 at 8:24 PM, Jason (Google)apija...@google.com wrote: Batch puts are supported, yes, and as of yesterday's release, calling makePersistentAll (JDO) and the equivalent JPA call will take advantage of this support (previously, you had to use the low-level API). Two quick notes: 1) All of the entities that you're persisting should be in separate entity groups since two entities in the same entity group can't be written to consecutively, and you will see datastore timeout exceptions if many simultaneous write requests come in for the same entity or entity group. 2) Batch puts do not operate in a transaction. This means that some writes may succeed but others may not, so if you need the ability to rollback, you'll need transactions. - Jason Let me know if you have any more questions on this. - Jason --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en -~--~~~~--~~--~--~---
[appengine-java] Re: Bulk writes to datastore
Your two quick notes seem to be contradictory. In order to use transactions, don't all of the entities have to be in the same entity group? Vince On Fri, Sep 4, 2009 at 8:24 PM, Jason (Google)apija...@google.com wrote: Batch puts are supported, yes, and as of yesterday's release, calling makePersistentAll (JDO) and the equivalent JPA call will take advantage of this support (previously, you had to use the low-level API). Two quick notes: 1) All of the entities that you're persisting should be in separate entity groups since two entities in the same entity group can't be written to consecutively, and you will see datastore timeout exceptions if many simultaneous write requests come in for the same entity or entity group. 2) Batch puts do not operate in a transaction. This means that some writes may succeed but others may not, so if you need the ability to rollback, you'll need transactions. - Jason Let me know if you have any more questions on this. - Jason --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en -~--~~~~--~~--~--~---
[appengine-java] Re: Bulk writes to datastore
Batch puts are supported, yes, and as of yesterday's release, calling makePersistentAll (JDO) and the equivalent JPA call will take advantage of this support (previously, you had to use the low-level API). Two quick notes: 1) All of the entities that you're persisting should be in separate entity groups since two entities in the same entity group can't be written to consecutively, and you will see datastore timeout exceptions if many simultaneous write requests come in for the same entity or entity group. 2) Batch puts do not operate in a transaction. This means that some writes may succeed but others may not, so if you need the ability to rollback, you'll need transactions. - Jason Let me know if you have any more questions on this. - Jason On Thu, Sep 3, 2009 at 7:24 PM, Nicholas Albion nalb...@gmail.com wrote: Is it possible to overcome the datastore's 10 writes/second limit by batching them? I've got a table containing just over one million records (in CSV format). Does a batched write (of around 1MB of data and, say 1000 records) count as one write, or 1000 writes? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine for Java group. To post to this group, send email to google-appengine-java@googlegroups.com To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en -~--~~~~--~~--~--~---