On Oct 16, 2013 12:01 PM, "Jonathan S. Shapiro" <[email protected]> wrote:
> I think it's safe to say that naive reference counting - even when it is
not interlocked - has deadly performance.

It is not at all safe to say this!

I think it is safe to say naive reference counting has deadly (poor)
mutator throughput, but this is often not the dominant performance factor.

That word "performance" is just too abstract and defined in the eyes of the
beholder. Sometimes the performance we care most about is end-user
perceived latency.

----

I don't mean for the examples to come to sound like beating a dead horse
(GC pause times) Shap, I know you are onboard with running mutators
concurrently with collectors and minimizing pauses.

I'm going to include them because they provide an interesting window into
how systemic performance effects can be more important than specific micro
effects like mutator throughput.

For example, in my experience with big internet services (yahoo groups,
Google groups, google orkut), it has been easier to control end-user
percieved 99th percentile latencies in Python than in JVM/hotspot. This is
despite the fact that Python is not only naive reference counted, but it is
a *dog-friggin-slow* interpreter compared to hotspot.

So why can I get better "performance" (lower end user latency) out of it? I
think this is related to Bennie's point about how systemic performance
issues can be more dominant than mutator throughput (he referred to SIMD,
but it applies to many other patterns).

The python systems use a multi-process model with cheap COW forking which
Hotspot elected not to support. This allows them to efficiently share read
only data while separating the dynamic heaps for each worker. They perform
any expensive tracing or cleanup in-between requests when there is *no user
waiting*. If tracing pauses happen, they are normally invisible to the
user, and always invisible to the 400 other worker processes.

Compare that to webservng on the JVM. Hotspot has much better mutator
performance, but IMHO is constantly a fight between heap size and pause
times. Because all workers are running in the same heap/process. Something
triggers a GC pause and 400+ workers now have an end-user perceived delay.

This pattern is visible in client side software as well.

Mac/iOS uses reference counting over GC for a reason - to provide
predictable end-user perceived latency.

Android Dalvik provides an interesting case as well. It is not hotspot for
a reason - it supports COW forking, so each app cheaply runs in a separate
process and has its own private heap - a stark contrast to the typical
'broken' JVM model of running everything in a single process/heap. However,
android still has horrible end-user perceived latency jitter compared to
iOS.
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to