This series addresses the TODO in the chrono formatter about using
format_to instead of format for ostream insertion (PR libstdc++/111052).
Patch 1 replaces `os << std::format(...)` with
`std::format_to(ostreambuf_iterator, ...)` in all chrono operator<<
overloads, eliminating temporary std::string allocations.
Patch 2 adds a partial specialization of _Iter_sink for
ostreambuf_iterator that writes directly to the streambuf's put area
for unbounded output (zero copy), and uses bulk sputn for bounded or
unbuffered output. This avoids the per-character sputc overhead of the
generic _Iter_sink.
Performance results (x86_64, GCC 17, -O2, 5min calibration per benchmark):
Chrono operator<< latency:
sys_time 1538 -> 1131 ns (1.36x)
zoned_time 2016 -> 1567 ns (1.29x)
hh_mm_ss 879 -> 689 ns (1.28x)
year_month_day 874 -> 695 ns (1.26x)
day 764 -> 621 ns (1.23x)
weekday 978 -> 771 ns (1.27x)
format_to ostreambuf_iterator latency:
~30 B output 1024 -> 853 ns (1.20x)
~200 B output 7948 -> 5682 ns (1.40x)
format_to_n bounded (6 cases, 10-200 B): 1.11x-1.31x faster, no regressions.
Multi-threaded throughput (sys_time, 1/2/4/8 threads): 1.47x-1.74x faster.
The improvement scales with thread count due to reduced allocator contention.
perf stat confirms instruction count reduction (-11% for sys_time, -8% for
zoned_time), fewer L1 dcache loads (-11% for format_to small), and fewer
branches, with no change in branch mispredict rate.
Smoke test: all 30 chrono operator<< outputs byte-identical between versions.
Tested on x86_64-linux-gnu. Bootstrap and libstdc++ testsuite clean.
Anlai Lu (2):
libstdc++: Use format_to for chrono ostream insertion [PR111052]
libstdc++: Add ostreambuf_iterator _Iter_sink specialization
[PR111052]
libstdc++-v3/include/bits/chrono_io.h | 102 ++++++++++++++----
.../include/bits/streambuf_iterator.h | 28 +++++
libstdc++-v3/include/std/format | 101 +++++++++++++++++
3 files changed, 212 insertions(+), 19 deletions(-)
--
2.34.1