A few lines of IRC chat, Freenode #darktable. Hanatos is Darktable
project founder.


[09:17] <hanatos_> ``requiring SSE3 is not really allowed ''
[09:17] <hanatos_> so much bundled cluelessness :/
[09:19] <hanatos_> Germano: re 32-bit
[09:19] <hanatos_> the sse thing is one thing
[09:20] <hanatos_> the other is the very limited virtual address space
(2G really)
[09:20] <hanatos_> everybody coding anything half way serious will tell
you the same story
[09:20] <hanatos_> (rawtherapee has the same issues iirc)
[09:20] <hanatos_> our old cache was allocing one big chunk of memory at
startup and maintained it manually
[09:20] <hanatos_> essentially duplicating a poor man's malloc,
specialised for our thumbnail caches
[09:21] <hanatos_> the new cache is much faster and easier to read
[09:21] <hanatos_> but based on malloc/free
[09:21] <hanatos_> which means your virtual address space (not the
physical one mapping to your ram)
[09:21] <hanatos_> will get fragmented and you quickly start addressing
blocks above the 10G range
[09:22] <hanatos_> which may not be a problem, even on systems with only
2G of physical ram, because blocks have been freed in between. it's just
on 32-bit systems you can't address it any more and die
[09:22] <boucman> basically, at this point, DT makes no sense on x86,
except maybe dt-cli
[09:22] <hanatos_> which is a similar argument as the sse3 is.
[09:22] <hanatos_> it's just not a worthwhile experience running this
software on this kind of hardware
[09:22] <hanatos_> boucman: yes, that.
[09:23] <Artefact2> hanatos_: I think jmalloc is also more clever than
glibc malloc wrt fragmentation. that's why blender uses it, afaik
[09:24] <Artefact2> *je
[09:24] <hanatos_> Germano: so i'd like to contradict the `upstream
doesn't care' bit
[09:25] <hanatos_> upstream does care.
[09:25] <hanatos_> just not about random principles and guidelines
[09:25] <hanatos_> but about how well darktable runs
[09:25] <hanatos_> Artefact2: jemalloc you mean
[09:25] <Artefact2> hanatos_: yes
[09:25] <hanatos_> yes, it's mostly multithreaded/
[09:25] <hanatos_> block per thread
[09:25] <hanatos_> might be worthwhile when running many threads for
thumbnail gen
[09:25] <hanatos_> but honestly i doubt it
[09:26] <hanatos_> it speeds up another piece of code i wrote
[09:26] <hanatos_> which uses many 10s of 1000s of malloc calls per second..
[09:26] <hanatos_> we don't do that in dt
[09:26] <hanatos_> (or tcmalloc for that matter)
[09:26] <hanatos_> simple enough to try with an LD_PRELOAD
[09:26] <Artefact2> oh yeah. calling malloc this many times is a bad
idea anyway
[09:32] <hanatos_> the alternative would have been allocate ridiculous
amounts of memory up front
[09:32] <hanatos_> bad idea, too
[09:32] <hanatos_> but if you have a better solution i'd sure like to
hear it :)
[09:32] <hanatos_> the problem is to construct a binary search tree
[09:32] <Artefact2> i'm not a memory guru, sadly :|
[09:32] <hanatos_> in parallel
[09:32] <hanatos_> so you start at the root and push the children as new
jobs (malloc job_t)
[09:32] <hanatos_> and so on
[09:33] <hanatos_> it's millions of nodes total, so you don't want to
allocate them up front
[09:33] <Artefact2> maybe a compromise. allocate a pool that can store,
say 10 jobs at a time
[09:34] <hanatos_> (and yes, i would agree.. calling malloc is almost
always a bad idea, unless you can't avoid it)
[09:34] <hanatos_> but see.. that pool per thread.. that's exactly what
jemalloc/tcmalloc do
[09:34] <Artefact2> this way you reduce the allocator load by a factor
of 10, while still not allocating huge amounts of contiguous memory
[09:35] <Artefact2> maybe the issue is elsewhere. what are you doing
millions of? is it possible to make "bigger" jobs and have less of them?
ie a smaller tree
[09:38] <hanatos_> nope, can't touch the tree
[09:38] <hanatos_> its some spatial acceleration structure for ray tracing
[09:39] <hanatos_> it's been optimised for fast ray tracing for many years
[09:39] <Artefact2> are we still talking about darktable? didn't know it
needed a raytracer
[09:40] <hanatos_> no, different piece of code.. as i said, i don't
think darktable needs thread-cached malloc
[10:11] <hanatos_> Germano: also feel free to refer those guys here to
us if they have questions. seems to me that some direct contact may be
better.
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Reply via email to