Hi all, Recent SVN versions of the PolyML runtime system contain a useful statistics interface, which can help improve the understanding of how Isabelle really works.
In essence, you can query various runtime parameters, such as the heap
size, number of garbage collections, threads etc. Tracking these
values over time during a session (and possibly correlating it with
other events) can give useful insights, and has already helped us
understanding performance issues with the new importer.
As as example, attached is a graph of the heap consumption when
building HOL (as of 26c1a97c7784), running on my laptop. The graph
itself is not very exciting, but in other situations, this may show
bottlenecks. I am currently trying to generate the same for
JinjaThreads, which may be more interesting.
Technically, there are two ways of retrieving the data:
a) via PolyML.Statistics.getLocalStats for the local polyml process
b) via PolyML.Statistics.getRemoteStats for some remote process
identified by its PID. (the two PolyML instances then communicate
via a memory-mapped file).
The data above was generated via b) and a somewhat ad-hoc script
(which is plain PolyML, not Isabelle/ML) which I attach for those
who want to play with it.
Obviously, we should be collecting this information on a regular basis.
Here, I think, approach a) is more convenient: An Isabelle session
would then report its own runtime statistics in a suitable way, as part
of the logs.
More concretely, here is what I think we should do:
1) Introduce an "Isabelle Statistics Interface" that exposes the data
in a portable way as part of the ML compatibility layer (reporting
dummy values if the runtime does not have statistics).
2) The Isabelle session runtime maintains a reporting thread, which
emits the current values in configured time intervals, via
some existing channel (tracing, stderr, ...?).
3) A simple script in the spirit of Admin/profiling_report extracts the
data from the logs and outputs them in a plot-friendly format,
preferably CSV.
I guess I'll try to do 1., which should be easy but needs review from Makarius. Then, 2 can be prototyped in user-space first. Any comments? Ideas? Alex
<<attachment: HOL_heap.png>>
#!/bin/bash PRG="$(basename "$0")" THIS="$(cd "$(dirname "$0")"; pwd)" POLL_INTERVAL=500 # milliseconds LOG_DIR="$THIS/logs" if [ "A$1" = "A" ]; then echo echo "Usage: $PRG PID" echo echo "Collect PolyML statistics data from process with id PID" echo exit 1 fi # Requires the polyml script to be on the path polyml --use "$THIS/polyml_collector.ML" -- $POLL_INTERVAL "$LOG_DIR" $1
(* Fairly ad-hoc script for producing CSV output from PolyML.Statistics
To be invoked via the accompanying shell script
*)
(* basic file handling *)
fun with_file name =
let
val out = TextIO.openOut name;
fun print (s : string) = (TextIO.output (out, s); TextIO.flushOut out);
in print end
fun userCounterName i = "userCounter" ^ StringCvt.padLeft #"0" 2 (Int.toString i)
val counterNames =
["Date", "threadsTotal", "threadsInML", "threadsWaitIO", "threadsWaitMutex", "threadsWaitCondVar", "threadsWaitSignal",
"gcFullGCs", "gcPartialGCs", "sizeHeap", "sizeHeapFreeLastGC", "sizeHeapFreeLastFullGC", "sizeAllocation",
"sizeAllocationFree"]
@ List.tabulate (PolyML.Statistics.numUserCounters (), userCounterName)
@ [ "timeNonGCUser", "timeNonGCSystem", "timeGCUser", "timeGCSystem" ]
(* produce csv data from statistics *)
fun csv_stats {
threadsTotal, threadsInML, threadsWaitIO, threadsWaitMutex, threadsWaitCondVar, threadsWaitSignal,
gcFullGCs, gcPartialGCs, sizeHeap, sizeHeapFreeLastGC, sizeHeapFreeLastFullGC, sizeAllocation,
sizeAllocationFree, timeNonGCUser, timeNonGCSystem, timeGCUser, timeGCSystem, userCounters } =
let
val time = Time.now ()
val millis = List.nth (String.fields (fn c => c = #".") (Time.fmt 3 time), 1)
val datestr = Date.fmt "%d.%m.%Y %H:%M:%S." (Date.fromTimeLocal time) ^ millis
val user = Vector.foldr (op ::) [] userCounters
val fields = datestr ::
map Int.toString ([threadsTotal, threadsInML, threadsWaitIO, threadsWaitMutex, threadsWaitCondVar, threadsWaitSignal,
gcFullGCs, gcPartialGCs, sizeHeap, sizeHeapFreeLastGC, sizeHeapFreeLastFullGC, sizeAllocation, sizeAllocationFree] @ user) @
map Time.toString [ timeNonGCUser, timeNonGCSystem, timeGCUser, timeGCSystem ]
in
String.concatWith ";" fields ^ "\n"
end
(* main *)
val pid_raw :: logdir :: sleepMs_raw :: _ = rev (CommandLine.arguments ())
val pid = Option.valOf (Int.fromString pid_raw)
val sleepMs = Option.valOf (Int.fromString sleepMs_raw)
val filename = logdir ^ "/polyml" ^ Int.toString pid ^ Date.fmt "-%Y-%m-%d-%H-%M-%S.csv" (Date.fromTimeLocal (Time.now ()))
val _ = OS.Process.system ("mkdir -p " ^ logdir)
val _ = print ("Writing to " ^ filename ^ "...\n")
val write = with_file filename
val _ = write (String.concatWith ";" counterNames ^ "\n\n")
fun get_stats () =
if pid = 0 then PolyML.Statistics.getLocalStats ()
else PolyML.Statistics.getRemoteStats pid
fun loop () : unit =
let
val line = csv_stats (get_stats ())
in
(write line; print ".";
OS.Process.sleep (Time.fromMilliseconds sleepMs);
loop ())
handle Interrupt => (print "Exiting.\n"; OS.Process.exit (OS.Process.success))
end
val _ = loop ()
_______________________________________________ isabelle-dev mailing list [email protected] https://mailmanbroy.informatik.tu-muenchen.de/mailman/listinfo/isabelle-dev
