On Wed, Jun 22, 2011 at 3:36 PM, Cal Leeming [Simplicity Media Ltd] <cal.leem...@simplicitymedialtd.co.uk> wrote: > Hey Thomas, > Yeah we actually spoke a little while ago about DSE. In the end, we actually > used a custom approach which analyses data in blocks of 50k rows, builds a > list of rows which need changing to the same value, then applied them in > bulk using update() + F().
Hmmm, what do you mean by "bulk using update() + F()? Something like "update sometable set somefield1 = somevalue1, somefield2 = somevalue2 where id in (1,2,3 .....)" ? Does "avg 13.8 mins/million" mean you processed 13.8 million rows pr minute? What kind of hardware did you use? Thomas > Here's our benchmark: > (42.11s) Found 49426 objs (match: 16107) (db writes: 50847) (range: 72300921 > ~ 72350921), (avg 13.8 mins/million) - [('is_checked', 49426), > ('is_image_blocked', 0), ('has_link', 1420), ('is_spam', 1)] > (44.50s) Found 49481 objs (match: 16448) (db writes: 50764) (range: 72350921 > ~ 72400921), (avg 14.6 mins/million) - [('is_checked', 49481), > ('is_image_blocked', 0), ('has_link', 1283), ('is_spam', 0)] > (55.78s) Found 49627 objs (match: 18516) (db writes: 50832) (range: 72400921 > ~ 72450921), (avg 18.3 mins/million) - [('is_checked', 49627), > ('is_image_blocked', 0), ('has_link', 1205), ('is_spam', 0)] > (42.03s) Found 49674 objs (match: 17244) (db writes: 51655) (range: 72450921 > ~ 72500921), (avg 13.6 mins/million) - [('is_checked', 49674), > ('is_image_blocked', 0), ('has_link', 1971), ('is_spam', 10)] > (51.98s) Found 49659 objs (match: 16563) (db writes: 51180) (range: 72500921 > ~ 72550921), (avg 16.9 mins/million) - [('is_checked', 49659), > ('is_image_blocked', 0), ('has_link', 1517), ('is_spam', 4)] > Could you let me know if those benchmarks are better/worse than using DSE? > I'd be interested to see the comparison! > Cal > On Wed, Jun 22, 2011 at 2:31 PM, Thomas Weholt <thomas.weh...@gmail.com> > wrote: >> >> Yes! I'm in. >> >> Out of curiosity: When inserting lots of data, how do you do it? Using >> the orm? Have you looked at http://pypi.python.org/pypi/dse/2.1.0 ? I >> wrote DSE to solve inserting/updating huge sets of data, but if >> there's a better way to do it that would be especially interesting to >> hear more about ( and sorry for the self promotion ). >> >> Regards, >> Thomas >> >> On Wed, Jun 22, 2011 at 3:15 PM, Cal Leeming [Simplicity Media Ltd] >> <cal.leem...@simplicitymedialtd.co.uk> wrote: >> > Hi all, >> > Some of you may have noticed, in the last few months I've done quite a >> > few >> > posts/snippets about handling large data sets in Django. At the end of >> > this >> > month (after what seems like a lifetime of trial and error), we're >> > finally >> > going to be releasing a new site which holds around 40mil+ rows of data, >> > grows by about 300-500k rows each day, handles 5GB of uploads per day, >> > and >> > can handle around 1024 requests per second on stress test on a >> > moderately >> > spec'd server. >> > As the entire thing is written in Django (and a bunch of other open >> > source >> > products), I'd really like to give something back to the >> > community. (stack >> > incls Celery/RabbitMQ/Sphinx SE/PYQuery/Percona >> > MySQL/NGINX/supervisord/debian etc) >> > Therefore, I'd like to see if there would be any interest in webcast in >> > which I would explain how we handle such large amounts of data, the >> > trial >> > and error processes we went through, some really neat tricks we've done >> > to >> > avoid bottlenecks, our own approach to smart content filtering, and some >> > of >> > the valuable lessons we have learned. The webcast would be completely >> > free >> > of charge, last a couple of hours (with a short break) and anyone can >> > attend. I'd also offer up a Q&A session at the end. >> > If you're interested, please reply on-list so others can see. >> > Thanks >> > Cal >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "Django users" group. >> > To post to this group, send email to django-users@googlegroups.com. >> > To unsubscribe from this group, send email to >> > django-users+unsubscr...@googlegroups.com. >> > For more options, visit this group at >> > http://groups.google.com/group/django-users?hl=en. >> > >> >> >> >> -- >> Mvh/Best regards, >> Thomas Weholt >> http://www.weholt.org >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Django users" group. >> To post to this group, send email to django-users@googlegroups.com. >> To unsubscribe from this group, send email to >> django-users+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/django-users?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to django-users@googlegroups.com. > To unsubscribe from this group, send email to > django-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > -- Mvh/Best regards, Thomas Weholt http://www.weholt.org -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.