Hey Thomas,

Yeah we actually spoke a little while ago about DSE. In the end, we actually
used a custom approach which analyses data in blocks of 50k rows, builds a
list of rows which need changing to the same value, then applied them in
bulk using update() + F().

Here's our benchmark:

(42.11s) Found 49426 objs (match: 16107) (db writes: 50847) (range: 72300921
~ 72350921), (avg 13.8 mins/million) - [('is_checked', 49426),
('is_image_blocked', 0), ('has_link', 1420), ('is_spam', 1)]
(44.50s) Found 49481 objs (match: 16448) (db writes: 50764) (range: 72350921
~ 72400921), (avg 14.6 mins/million) - [('is_checked', 49481),
('is_image_blocked', 0), ('has_link', 1283), ('is_spam', 0)]
(55.78s) Found 49627 objs (match: 18516) (db writes: 50832) (range: 72400921
~ 72450921), (avg 18.3 mins/million) - [('is_checked', 49627),
('is_image_blocked', 0), ('has_link', 1205), ('is_spam', 0)]
(42.03s) Found 49674 objs (match: 17244) (db writes: 51655) (range: 72450921
~ 72500921), (avg 13.6 mins/million) - [('is_checked', 49674),
('is_image_blocked', 0), ('has_link', 1971), ('is_spam', 10)]
(51.98s) Found 49659 objs (match: 16563) (db writes: 51180) (range: 72500921
~ 72550921), (avg 16.9 mins/million) - [('is_checked', 49659),
('is_image_blocked', 0), ('has_link', 1517), ('is_spam', 4)]

Could you let me know if those benchmarks are better/worse than using DSE?
I'd be interested to see the comparison!

Cal

On Wed, Jun 22, 2011 at 2:31 PM, Thomas Weholt <thomas.weh...@gmail.com>wrote:

> Yes! I'm in.
>
> Out of curiosity: When inserting lots of data, how do you do it? Using
> the orm? Have you looked at http://pypi.python.org/pypi/dse/2.1.0 ? I
> wrote DSE to solve inserting/updating huge sets of data, but if
> there's a better way to do it that would be especially interesting to
> hear more about ( and sorry for the self promotion ).
>
> Regards,
> Thomas
>
> On Wed, Jun 22, 2011 at 3:15 PM, Cal Leeming [Simplicity Media Ltd]
> <cal.leem...@simplicitymedialtd.co.uk> wrote:
> > Hi all,
> > Some of you may have noticed, in the last few months I've done quite a
> few
> > posts/snippets about handling large data sets in Django. At the end of
> this
> > month (after what seems like a lifetime of trial and error), we're
> finally
> > going to be releasing a new site which holds around 40mil+ rows of data,
> > grows by about 300-500k rows each day, handles 5GB of uploads per day,
> and
> > can handle around 1024 requests per second on stress test on a moderately
> > spec'd server.
> > As the entire thing is written in Django (and a bunch of other open
> source
> > products), I'd really like to give something back to the
> community. (stack
> > incls Celery/RabbitMQ/Sphinx SE/PYQuery/Percona
> > MySQL/NGINX/supervisord/debian etc)
> > Therefore, I'd like to see if there would be any interest in webcast in
> > which I would explain how we handle such large amounts of data, the trial
> > and error processes we went through, some really neat tricks we've done
> to
> > avoid bottlenecks, our own approach to smart content filtering, and some
> of
> > the valuable lessons we have learned. The webcast would be completely
> free
> > of charge, last a couple of hours (with a short break) and anyone can
> > attend. I'd also offer up a Q&A session at the end.
> > If you're interested, please reply on-list so others can see.
> > Thanks
> > Cal
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Django users" group.
> > To post to this group, send email to django-users@googlegroups.com.
> > To unsubscribe from this group, send email to
> > django-users+unsubscr...@googlegroups.com.
> > For more options, visit this group at
> > http://groups.google.com/group/django-users?hl=en.
> >
>
>
>
> --
> Mvh/Best regards,
> Thomas Weholt
> http://www.weholt.org
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Reply via email to