Hey Thomas, Yeah we actually spoke a little while ago about DSE. In the end, we actually used a custom approach which analyses data in blocks of 50k rows, builds a list of rows which need changing to the same value, then applied them in bulk using update() + F().
Here's our benchmark: (42.11s) Found 49426 objs (match: 16107) (db writes: 50847) (range: 72300921 ~ 72350921), (avg 13.8 mins/million) - [('is_checked', 49426), ('is_image_blocked', 0), ('has_link', 1420), ('is_spam', 1)] (44.50s) Found 49481 objs (match: 16448) (db writes: 50764) (range: 72350921 ~ 72400921), (avg 14.6 mins/million) - [('is_checked', 49481), ('is_image_blocked', 0), ('has_link', 1283), ('is_spam', 0)] (55.78s) Found 49627 objs (match: 18516) (db writes: 50832) (range: 72400921 ~ 72450921), (avg 18.3 mins/million) - [('is_checked', 49627), ('is_image_blocked', 0), ('has_link', 1205), ('is_spam', 0)] (42.03s) Found 49674 objs (match: 17244) (db writes: 51655) (range: 72450921 ~ 72500921), (avg 13.6 mins/million) - [('is_checked', 49674), ('is_image_blocked', 0), ('has_link', 1971), ('is_spam', 10)] (51.98s) Found 49659 objs (match: 16563) (db writes: 51180) (range: 72500921 ~ 72550921), (avg 16.9 mins/million) - [('is_checked', 49659), ('is_image_blocked', 0), ('has_link', 1517), ('is_spam', 4)] Could you let me know if those benchmarks are better/worse than using DSE? I'd be interested to see the comparison! Cal On Wed, Jun 22, 2011 at 2:31 PM, Thomas Weholt <thomas.weh...@gmail.com>wrote: > Yes! I'm in. > > Out of curiosity: When inserting lots of data, how do you do it? Using > the orm? Have you looked at http://pypi.python.org/pypi/dse/2.1.0 ? I > wrote DSE to solve inserting/updating huge sets of data, but if > there's a better way to do it that would be especially interesting to > hear more about ( and sorry for the self promotion ). > > Regards, > Thomas > > On Wed, Jun 22, 2011 at 3:15 PM, Cal Leeming [Simplicity Media Ltd] > <cal.leem...@simplicitymedialtd.co.uk> wrote: > > Hi all, > > Some of you may have noticed, in the last few months I've done quite a > few > > posts/snippets about handling large data sets in Django. At the end of > this > > month (after what seems like a lifetime of trial and error), we're > finally > > going to be releasing a new site which holds around 40mil+ rows of data, > > grows by about 300-500k rows each day, handles 5GB of uploads per day, > and > > can handle around 1024 requests per second on stress test on a moderately > > spec'd server. > > As the entire thing is written in Django (and a bunch of other open > source > > products), I'd really like to give something back to the > community. (stack > > incls Celery/RabbitMQ/Sphinx SE/PYQuery/Percona > > MySQL/NGINX/supervisord/debian etc) > > Therefore, I'd like to see if there would be any interest in webcast in > > which I would explain how we handle such large amounts of data, the trial > > and error processes we went through, some really neat tricks we've done > to > > avoid bottlenecks, our own approach to smart content filtering, and some > of > > the valuable lessons we have learned. The webcast would be completely > free > > of charge, last a couple of hours (with a short break) and anyone can > > attend. I'd also offer up a Q&A session at the end. > > If you're interested, please reply on-list so others can see. > > Thanks > > Cal > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Django users" group. > > To post to this group, send email to django-users@googlegroups.com. > > To unsubscribe from this group, send email to > > django-users+unsubscr...@googlegroups.com. > > For more options, visit this group at > > http://groups.google.com/group/django-users?hl=en. > > > > > > -- > Mvh/Best regards, > Thomas Weholt > http://www.weholt.org > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to django-users@googlegroups.com. > To unsubscribe from this group, send email to > django-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.