The discussion is restricted to AArch64.

Question: On arm64, publicationBarrier in mallocgc is implemented as DMB ST.
What is the invariant that requires it to execute at its current position?

Specifically:
- Must it execute before the allocated object becomes visible to another 
P/M?
- Must it execute before GC metadata becomes visible?
- Or is it required for maintaining the tri-color invariant under 
concurrent GC?

My reasoning (please correct me if wrong)

The comment in runtime/stubs.go says that the purpose of publicationBarrier 
is to ensure that other processors observe the fully initialized object 
before it becomes reachable from GC.

If that is the case, it seems that as long as:

1) the allocated object is not yet accessible by another goroutine, and
2) the goroutine which does the allocation is not preempted or schedule 
itself through chanrecv or other operations to another P/M,

then the barrier might be deferrable.

Under this reasoning, it appears possible that a single DMB ST could be 
shared across multiple consecutive mallocgc calls.

However, I'm unsure whether this reasoning overlooks some GC or scheduler 
invariants, and that is what I would like to understand.

---

Background:

The current order in mallocgc (simplified) is:

```go
alloc
publicationBarrier   // DMB ST
update GC metadata
```

According to measurements in issue comment 
https://github.com/golang/go/issues/63640#issuecomment-3661284210, the 
barrier can account for ~35–40% of mallocgc time on arm64 microbenchmarks.

I experimented with amortizing the barrier across multiple consecutive 
allocations (i.e., sharing the DMB ST). The design is omitted here for 
concise question. Microbenchmark results show mixed performance impact:

```
goos: linux
goarch: arm64
pkg: runtime
                     │ default.txt │              batch.txt              │
                     │   sec/op    │   sec/op     vs base                │
Malloc8-64             22.11n ± 0%   21.82n ± 0%   -1.31% (p=0.000 n=10)
Malloc16-64            38.79n ± 0%   33.76n ± 0%  -12.98% (p=0.000 n=10)
MallocTypeInfo8-64     28.49n ± 0%   31.37n ± 0%  +10.11% (p=0.000 n=10)
MallocTypeInfo16-64    38.19n ± 0%   39.57n ± 0%   +3.61% (p=0.000 n=10)
MallocLargeStruct-64   417.9n ± 1%   400.8n ± 1%   -4.10% (p=0.000 n=10)
geomean                52.27n        51.62n        -1.24%
```

However, my main concern is correctness: I would like to understand the 
exact memory-ordering guarantee enforced by this barrier on AArch64.

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/82dd40a1-fbb1-4379-b273-4558954b109bn%40googlegroups.com.

Reply via email to