On Monday, 22 September 2025 at 11:14:17 UTC, Sönke Ludwig wrote:
Am 22.09.25 um 09:49 schrieb Dmitry Olshansky:
On Friday, 19 September 2025 at 17:37:36 UTC, Sönke Ludwig
wrote:
So you don't support timeouts when waiting for an event at
all? Otherwise I don't see why a separate API would be
required, this should be implementable with plain Posix APIs
within vibe-core-lite itself.
Photon's API is the syscall interface. So to wait on an event
you just call poll.
Behind the scenes it will just wait on the right fd to change
state.
Now vibe-core-light wants something like read(buffer, timeout)
which is not syscall API but maybe added. But since I'm going
to add new API I'd rather have something consistent and sane
not just a bunch of adhoc functions to satisfy vibe.d
interface.
Why can't you then use poll() to for example implement
`ManualEvent` with timeout and interrupt support? And shouldn't
recv() with timeout be implementable the same way, poll with
timeout and only read when ready?
Yes, recv with timeout is basically poll+recv. The problem is
that then I need to support interrupts in poll. Nothing really
changed.
As far as manual event goes I've implemented that with custom
cond var and mutex. That mutex is not interruptible as it's
backed by semaphore on slow path in a form of eventfd.
I might create custom mutex that is interruptible I guess but the
notion of interrupts would have to be introduced to photon. I do
not really like it.
I think we have a misunderstanding of what vibe.d is supposed
to be. It seems like you are only focused on the web/server
role, while to me vibe-core is a general-purpose I/O and
concurrency system with no particular specialization in server
tasks. With that view, your statement to me sounds like
"Clearly D is not meant to do multi-threading, since main() is
only running in a single thread".
The defaults are what is important. Go defaults to
multi-threading for instance.
D defaults to multi-threading because TLS by default is certainly
a mark of multi-threaded environment. std.concurrency defaults to
new thread per spawn, again this tells me it's about
multithreading. I intend to support multi-threading by default. I
understand that we view this issue differently.
Of course, there could be a high-level component on top of
vibe-d:web that makes some opinionated assumptions on how to
structure a web application to ensure it is scalable, but that
would go against the idea of being a toolkit with functional
building blocks, as opposed to a framework that dictates your
application structure.
Agreed.
Not everything is CPU bound and using threads "just
because" doesn't make sense either. This is especially
true, because of low level race conditions that require
special care. D's shared/immutable helps with that, but
that also means that your whole application suddenly needs
to use shared/immutable when passing data between tasks.
I’m dying to know which application not being cpu bound
still needs to pass data between tasks that are all running
on a single thread.
Anything client side involving a user interface has plenty of
opportunities for employing secondary tasks or long-running
sparsely updated state logic that are not CPU bound. Most of
the time is spent idle there. Specific computations on the
other hand can of course still be handed off to other threads.
Latency still going to be better if multiple cores are
utilized.
And I'm still not sure what the example is.
We are comparing fiber switches and working on data with a
shared cache and no synchronization to synchronizing data
access and control flow between threads/cores. There is such a
broad spectrum of possibilities for one of those to be faster
than the other that it's just silly to make a general statement
like that.
The thing is that if you always share data between threads, you
have to pay for that for every single data access, regardless
of whether there is actual concurrency going on or not.
Obviously, we should strive to share responsibly. Photon has
Channels much like vibe-core has Channel. Mine are MPSC though,
mostly to model Input/Output range concepts.
If you want a concrete example, take a simple download dialog
with a progress bar. There is no gain in off-loading anything
to a separate thread here, since this is fully I/O bound, but
it adds quite some communication complexity if you do. CPU
performance is simply not a concern here.
Channels tame the complexity. Yes, channels could get more
expansive in multi-threaded scenario but we already agreed that
it's not CPU bound.
But TLS variables are always "globals" in the sense that
they outlive the scope that accesses them. A modification
in one thread would obviously not be visible in another
thread, meaning that you may or may not have a semantic
connection when you access such a library sequentially from
multiple tasks.
And then there are said libraries that are not thread-safe
at all, or are bound to the thread where you initialize
them. Or handles returned from a library may be bound to
the thread that created them. Dealing with all of this just
becomes needlessly complicated and error-prone, especially
if CPU cycles are not a concern.
TLS is fine for using not thread safe library - just make
sure you initialize it for all threads. I do not switch or
otherwise play dirty tricks with TLS.
The problem is that for example you might have a handle that
was created in thread A and is not valid in thread B, or you
set a state in thread A and thread B doesn't see that state.
This would mean that you are limited to a single task for the
complete library interaction.
Or just initialize it lazily in all threads that happen to use
it.
Otherwise, this is basically stick to one thread really.
But then it's a different handle representing a different
object - that's not the same thing. I'm not just talking about
initializing the library as a whole. But even if, there are a
lot of libraries that don't use TLS and are simply not
thread-safe at all.
Something that is not thread-safe at all is a dying breed. It's
been 20 years that we have multi-cores. Most libraries can be
initialized once per thread which is quite naturally modeled with
TLS handle to said library. Communicating between fibers via
shared TLS handle is not something I would recommend regardless
of the default spawn behavior.
By robbing the user the control over where a task spawns,
you are also forcing synchronization everywhere, which can
quickly become more expensive than any benefits you would
gain from using multiple threads.
Either of default kind of rob user of control of where the
task spawns. Which is sensible a user shouldn’t really care.
This doesn't make sense, in the original vibe-core, you can
simply choose between spawning in the same thread or in "any"
thread. `shared`/`immutable` is correctly enforced in the
latter case to avoid unintended data sharing.
I have go and goOnSameThread. Guess which is the encouraged
option.
Does go() enforce proper use of shared/immutable when passing
data to the scheduled "go routine"?
It goes with the same API as we have for threads - a delegate, so
sharing becomes user's responsibility. I may add function + args
for better handling of resources passed to the lambda.
Finally, in the case of web applications, in my opinion the
better approach for using multiple CPU cores is *usually*
by running multiple *processes* in parallel, as opposed to
multiple threads within a single process. Of course, every
application is different and there is no one-size-fits-all
approach.
There we differ, not only load balancing is simpler within a
single application but also processes are more expansive.
Current D GC situation kind of sucks on multithreaded
workloads but that is the only reason to go multiprocess
IMHO.
The GC/malloc is the main reason why this is mostly false in
practice, but it extends to any central contention source
within the process - yes, often you can avoid that, but often
that takes a lot of extra work and processes sidestep that
issue in the first place.
As is observable from the look on other languages and runtimes
malloc is not the bottleneck it used to be. Our particular
version of GC that doesn't have thread caches is a bottleneck.
malloc() will also always be a bottleneck with the right load.
Just the n times larger amount of virtual address space
required may start to become an issue for memory heavy
applications. But even if ignore that, ruling out using the
existing GC doesn't sound like a good idea to me.
The existing GC is basically 20+ years old, ofc we need better GC
and
thread cached allocation solves contention in multi-threaded
environments.
Alternative memory allocator is doing great on 320 core machines.
I cannot tell you which allocator that is or what exactly these
servers are. Though even jemalloc does okayish.
And the fact is that, even with relatively mild GC use, a web
application will not scale properly with many cores.
Only partially agree, Java's GC handles load just fine and runs
faster than vibe.d(-light). It does allocations on its serving
code path.
Also, in the usual case where the threads don't have to
communicate with each other (apart from memory allocation
synchronization), a separate process per core isn't any
slower - except maybe when hyper- threading is in play, but
whether that helps or hurts performance always depends on the
concrete workload.
The fact that context switch has to drop all of virtual
address spaces does add a bit of overhead. Though to be
certain of anything there better be a benchmark.
There is no context switch involved with each process running
on its own core.
Yeah, pinning down cores works, I stand corrected.
Separate process also have the advantage of being more robust
and enabling seamless restarts and updates of the executable.
And they facilitate an application design that lends itself
to scaling across multiple machines.
Then give me the example code to run multiple vibe.d in
parallel processes (should be simillar to runDist) and we can
compare approaches. For all I know it could be faster then
multi-threaded vibe.d-light. Also honestly if vibe.d's target
is multiple processes it should probably start like this by
default.
Again, the "default" is a high-level issue and none of
vibe-core's business. The simplest way to have that work is to
use `HTTPServerOption.reusePort` and then start as many
processes as desired.
So I did just that. To my surprise it indeed speeds up all of my
D server examples.
The speed ups are roughly:
On vibe-http-light:
8 cores 1.14
12 cores 1.10
16 cores 1.08
24 cores 1.05
32 cores 1.06
48 cores 1.07
On vibe-http-classic:
8 cores 1.33
12 cores 1.45
16 cores 1.60
24 cores 2.54
32 cores 4.44
48 cores 8.56
On plain photon-http:
8 cores 1.15
12 cores 1.10
16 cores 1.09
24 cores 1.05
32 cores 1.07
48 cores 1.04
We should absolutely tweak vibe.d TechEmpower benchmark to run
vibe.d as a process per core! As far as photon-powered versions
go I see there is a point where per-process becomes less of a
gain with more cores, so I would think there are 2 factors at
play one positive and one negative, with negative being tied to
the number of processes.
Lastly, I have found opportunities to speed up vibe-http even
without switching to vibe-core-light. Will send PRs.