On Fri, 8 Sept 2023 at 03:46, Roberto A. Foglietta
<[email protected]> wrote:
> The two ways to take measures showed a difference of 7 - 18% when
> n(args) = 1 and 0.7 - 1.7% when n(args) = 10. This clearly a linear
> regression which tells us that the main source of error stays in the
> accuracy of the $min value which is an estimation. Therefore err(n) =
> err(min) / n. Fortunately, this is a conservative way of taking
> measures because I used the min value of 100. Conservative, it means
> that I could have not appreciated a small difference in performances
> between pidof and grep or a small difference in O(same). Such a small
> difference in performance would not have attracted my attention, in
> the first place and I would not have started to investigate the issue
> using statistics. Hence, the HP of seeing a wide difference in
> performance or in O(different) was acceptable from the beginning.
After all, time is not always precise enough to determine a measure:
redfishos:/rootfs # time date +%s%N
1689574011301750356
real 0m 0.00s
for i in $(seq 1 100); do date +%s%N; done | time cat >/dev/null
real 0m 0.45s
data=$(for i in $(seq 1 100); do a=$(date +%s%N); b=$(date +%s%N);
echo $((b-a)); done | sort -n); echo "$data" | head -n1;
echo "$data" | tail -n1; let avg=$(echo "$data" | tr '\n' '+')0;
echo $(( (avg+50) / 100 ))
4233021
5255000
4475716
Average is 4475716 ns = 4475 us while 0.45s = 45 ms = 450000 us
therefore 45000 us / 100 = 4500 us. These two ways of determining the
$min values give compatible results.
However, the value that reduces the gap between the time and date
difference measures is the max not the average.
max=5255000; cmd=""; for i in $(seq 0 9); do cmd="${cmd:-} command$i";
a=$(date +%s%N); time pidof $cmd 2>&1 | grep real; b=$(date +%s%N);
echo $(( (b-a) - max )); done
real 0m 0.09s
107959792
real 0m 0.18s
197792917
real 0m 0.28s
289114635
real 0m 0.37s
380441250
real 0m 0.46s
472342917
real 0m 0.55s
565745000
real 0m 0.64s
653705000
real 0m 0.73s
745983073
real 0m 0.82s
835870937
real 0m 0.93s
938043489
Using the max is not a conservative HP but after a 2 double check with
the use of time, it can be accepted.
for i in $(seq 1 100); do date +%s%N; usleep 10000; done | time cat >/dev/null
real 0m 1.48s
In fact, adding a little break between to date execution the average
raised-up to 4800 us
data=$(for i in $(seq 1 100); do a=$(date +%s%N); usleep 10000; b=$(date +%s%N);
echo $((b-a)); usleep 10000; done | sort -n); echo "$data" | head -n1;
echo "$data"
| tail -n1; let avg=$(echo "$data" | tr '\n' '+')0; echo $(( avg / 100 ))
14482083
15539479
14757910
in both cases, 4758 us to be precise, which is still smaller than
5255. The most interesting thing is that time date might not be 100%
reliable because both can access to RTC to take the times and
therefore some kind of correlation / interference can be established.
When a direct measure is not reliable, an indirect way to estimate a
value can be searching for a value that best fits a regression
relationship about cause-effect events. In this case 5255 can be a
candidate for such indirect measure. The best candidate can be
searched within [4500, 5600] interval. For each sequence 1..10 can be
found the best fit. With 100 sequences 1..10, we will have 100 best
fit and we can do statistics and regression with those numbers. We do
not need that, obviously. Because the difference in performance and
O(something) is wide enough that a conservative or a raw estimation is
enough to see it.
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox