This series addresses the TODO in the chrono formatter about using
format_to instead of format for ostream insertion (PR libstdc++/111052).

Patch 1 replaces `os << std::format(...)` with
`std::format_to(ostreambuf_iterator, ...)` in all chrono operator<<
overloads, eliminating temporary std::string allocations.

Patch 2 adds a partial specialization of _Iter_sink for
ostreambuf_iterator that writes directly to the streambuf's put area
for unbounded output (zero copy), and uses bulk sputn for bounded or
unbuffered output.  This avoids the per-character sputc overhead of the
generic _Iter_sink.

Performance results (x86_64, GCC 17, -O2, 5min calibration per benchmark):

Chrono operator<< latency:
  sys_time        1538 -> 1131 ns  (1.36x)
  zoned_time      2016 -> 1567 ns  (1.29x)
  hh_mm_ss         879 ->  689 ns  (1.28x)
  year_month_day   874 ->  695 ns  (1.26x)
  day              764 ->  621 ns  (1.23x)
  weekday          978 ->  771 ns  (1.27x)

format_to ostreambuf_iterator latency:
  ~30 B output    1024 ->  853 ns  (1.20x)
  ~200 B output   7948 -> 5682 ns  (1.40x)

format_to_n bounded (6 cases, 10-200 B): 1.11x-1.31x faster, no regressions.

Multi-threaded throughput (sys_time, 1/2/4/8 threads): 1.47x-1.74x faster.
The improvement scales with thread count due to reduced allocator contention.

perf stat confirms instruction count reduction (-11% for sys_time, -8% for
zoned_time), fewer L1 dcache loads (-11% for format_to small), and fewer
branches, with no change in branch mispredict rate.

Smoke test: all 30 chrono operator<< outputs byte-identical between versions.

Tested on x86_64-linux-gnu.  Bootstrap and libstdc++ testsuite clean.

Anlai Lu (2):
  libstdc++: Use format_to for chrono ostream insertion [PR111052]
  libstdc++: Add ostreambuf_iterator _Iter_sink specialization
    [PR111052]

 libstdc++-v3/include/bits/chrono_io.h         | 102 ++++++++++++++----
 .../include/bits/streambuf_iterator.h         |  28 +++++
 libstdc++-v3/include/std/format               | 101 +++++++++++++++++
 3 files changed, 212 insertions(+), 19 deletions(-)

--
2.34.1

Reply via email to