comment on analysis below...

Tao Chen wrote:
I should copy this to the list.

---------- Forwarded message ----------

On 6/23/06, * Joe Little* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    I can post back to Roch what this latency is. I think the latency is a
    constant regardless of the zil or not. all that I do by disabling the
    zil is that I'm able to submit larger chunks at a time (faster) than
doing 1k or worse blocks 3 times per file (the NFS fsync penalty)


Please send the script ( I attached a modified version ) along with the result.
They need to see how it works to trust ( or dispute ) the result.
Rule #1 in performance tuning is do not trust the report from an unproven tool :)

I have some comment on the output below.

    This is for a bit longer (16 trees of 6250 8k files, again with zil
    disabled):

    Generating report from biorpt.sh.rec ...

       === Top 5 I/O types ===

      DEVICE    T  BLKs     COUNT
      --------  -  ----  --------
      sd2       W   256      3095
      sd1       W   256      2843
      sd1       W     2       201
      sd2       W     2       197
      sd1       W    32       185


This part tells me majority of I/Os are 128KB writes on sd2 and sd1.

             === Top 5 worst I/O response time ===

      DEVICE    T  BLKs      OFFSET    TIMESTAMP  TIME.ms
      --------  -  ----  ----------  -----------  -------
      sd2       W   175   529070671    85.933843  3559.55
      sd1       W   256   521097680    47.561918  3097.21
      sd1       W   256   521151968    54.944253  3090.42
      sd1       W   256   521152224    54.944207  3090.23
      sd1       W    64   521152480    54.944241   3090.21


Longest response time are more than 3 seconds, ouch.

Very suspicious.  I would check for a retrans as 4 of these
are suspiciously close to 3 seconds.  You'll need to check retrans
on both ends of the wire.

              === Top 5 Devices with largest number of I/Os ===

      DEVICE      READ AVG.ms     MB    WRITE AVG.ms     MB      IOs SEEK
      -------  ------- ------ ------  ------- ------ ------  ------- ----
      sd1            6   0.34      0     4948 387.88    413     4954   0%
      sd2            6   0.25      0     4230 387.07    405     4236   0%
      cmdk0         23   8.11      0      152   0.84      0      175  10%


Average response time of > 300ms is bad.

Average is totally useless with this sort of a distribution.
I'd suggest using a statistical package to explore the distribution.
Just a few 3-second latencies will skew the average quite a lot.
 -- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to