On Wed, Jun 22, 2011 at 3:36 PM, Cal Leeming [Simplicity Media Ltd]
<cal.leem...@simplicitymedialtd.co.uk> wrote:
> Hey Thomas,
> Yeah we actually spoke a little while ago about DSE. In the end, we actually
> used a custom approach which analyses data in blocks of 50k rows, builds a
> list of rows which need changing to the same value, then applied them in
> bulk using update() + F().

Hmmm, what do you mean by "bulk using update() + F()? Something like
"update sometable set somefield1 = somevalue1, somefield2 = somevalue2
where id in (1,2,3 .....)" ? Does "avg 13.8 mins/million" mean you
processed 13.8 million rows pr minute? What kind of hardware did you
use?

Thomas

> Here's our benchmark:
> (42.11s) Found 49426 objs (match: 16107) (db writes: 50847) (range: 72300921
> ~ 72350921), (avg 13.8 mins/million) - [('is_checked', 49426),
> ('is_image_blocked', 0), ('has_link', 1420), ('is_spam', 1)]
> (44.50s) Found 49481 objs (match: 16448) (db writes: 50764) (range: 72350921
> ~ 72400921), (avg 14.6 mins/million) - [('is_checked', 49481),
> ('is_image_blocked', 0), ('has_link', 1283), ('is_spam', 0)]
> (55.78s) Found 49627 objs (match: 18516) (db writes: 50832) (range: 72400921
> ~ 72450921), (avg 18.3 mins/million) - [('is_checked', 49627),
> ('is_image_blocked', 0), ('has_link', 1205), ('is_spam', 0)]
> (42.03s) Found 49674 objs (match: 17244) (db writes: 51655) (range: 72450921
> ~ 72500921), (avg 13.6 mins/million) - [('is_checked', 49674),
> ('is_image_blocked', 0), ('has_link', 1971), ('is_spam', 10)]
> (51.98s) Found 49659 objs (match: 16563) (db writes: 51180) (range: 72500921
> ~ 72550921), (avg 16.9 mins/million) - [('is_checked', 49659),
> ('is_image_blocked', 0), ('has_link', 1517), ('is_spam', 4)]
> Could you let me know if those benchmarks are better/worse than using DSE?
> I'd be interested to see the comparison!
> Cal
> On Wed, Jun 22, 2011 at 2:31 PM, Thomas Weholt <thomas.weh...@gmail.com>
> wrote:
>>
>> Yes! I'm in.
>>
>> Out of curiosity: When inserting lots of data, how do you do it? Using
>> the orm? Have you looked at http://pypi.python.org/pypi/dse/2.1.0 ? I
>> wrote DSE to solve inserting/updating huge sets of data, but if
>> there's a better way to do it that would be especially interesting to
>> hear more about ( and sorry for the self promotion ).
>>
>> Regards,
>> Thomas
>>
>> On Wed, Jun 22, 2011 at 3:15 PM, Cal Leeming [Simplicity Media Ltd]
>> <cal.leem...@simplicitymedialtd.co.uk> wrote:
>> > Hi all,
>> > Some of you may have noticed, in the last few months I've done quite a
>> > few
>> > posts/snippets about handling large data sets in Django. At the end of
>> > this
>> > month (after what seems like a lifetime of trial and error), we're
>> > finally
>> > going to be releasing a new site which holds around 40mil+ rows of data,
>> > grows by about 300-500k rows each day, handles 5GB of uploads per day,
>> > and
>> > can handle around 1024 requests per second on stress test on a
>> > moderately
>> > spec'd server.
>> > As the entire thing is written in Django (and a bunch of other open
>> > source
>> > products), I'd really like to give something back to the
>> > community. (stack
>> > incls Celery/RabbitMQ/Sphinx SE/PYQuery/Percona
>> > MySQL/NGINX/supervisord/debian etc)
>> > Therefore, I'd like to see if there would be any interest in webcast in
>> > which I would explain how we handle such large amounts of data, the
>> > trial
>> > and error processes we went through, some really neat tricks we've done
>> > to
>> > avoid bottlenecks, our own approach to smart content filtering, and some
>> > of
>> > the valuable lessons we have learned. The webcast would be completely
>> > free
>> > of charge, last a couple of hours (with a short break) and anyone can
>> > attend. I'd also offer up a Q&A session at the end.
>> > If you're interested, please reply on-list so others can see.
>> > Thanks
>> > Cal
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Django users" group.
>> > To post to this group, send email to django-users@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > django-users+unsubscr...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/django-users?hl=en.
>> >
>>
>>
>>
>> --
>> Mvh/Best regards,
>> Thomas Weholt
>> http://www.weholt.org
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> django-users+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/django-users?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>



-- 
Mvh/Best regards,
Thomas Weholt
http://www.weholt.org

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Reply via email to