On Wed, Aug 12, 2009 at 12:15 AM, Bart Van
Assche<[email protected]> wrote:
On Tue, Aug 11, 2009 at 11:52 PM, Chris Worley<[email protected]> wrote:
I setup my target exactly as you prescribe... but my initiator is
still Windows (version of WInOF at top): performance as relayed by
IOMeter starts high and the average slowly decreases. Watching the
instantaneous throughput, there seem to be longer and longer lags of
poor performance. between moments of good performance. I need to run
this against a Linux initiator to see if the problems are w/ WinOF.
Using OFED 1.4.1 (w/ the stock RHEL kernel) on the target, the
performance was steady and getting close to acceptable. In a 15 hour
test that cycles through sequential and random LBA's and R/W mixes
from block sizes from 1MB to 512B, it worked well and got decent
performance until it hit 1KB sequential reads which hung IOMeter; no
messages on the Linux side (all looked okay). IBSRP on the Windows
side just said "a reset to device was issued" every 15 to 30 seconds
after the problem started. I reloaded the IB stack on the Linux side,
and was able to get it restarted.
Still a lot of combinations to test.
Which trace settings are you using on the target ? Enabling the proper
trace settings via /proc/scsi_tgt/trace_level might reveal whether you
are e.g. hitting the QUEUE_FULL condition. See also scst/README.
I've found a good kernel/scst mix to easily repeat this; I can get it
to repeatedly hang w/ 8K block transfers running Ubuntu 9.04 w/ the
2.6.27-14-server kernel on _both_ target and initiator (i.e. no WinOF
or OFED at all) and SCST rev 1062 on the target using one drive
(performance is >600MB/s, >80K IOPS, on the 8KB block sizes being
used).
Although the problem doesn't occur in Windows until blocks are <2KB
and the RHEL5.2/OFED configuration does not repeat the issue using a
Linux initiator, it seems like a very similar hang, so I'm hoping it's
the same issue.
To repeat the issue, I run 8KB block random reads w/ 64 threads,
running AIO calls w/ a depth of 64 (using "fio" on the initiator):
# fio --rw=randrw --bs=8k --rwmixread=100 --numjobs=64 --iodepth=64
--sync=0 --direct=1 --randrepeat=0 --ioengine=libaio
--filename=/dev/sdn --name=test --loops=10000 --size=16091503001
The "size" represents 10% of the drive. It doesn't seem to ever
happen on writes, but I've seen it happen on mixed reads/writes.
With tracing set to "default", there was still nothing in the target
logs at the time of the hang.
With tracing set thusly on the target:
echo "all" >/proc/scsi_tgt/trace_level
echo "all" >/proc/scsi_tgt/vdisk/trace_level
The last few lines of dmesg look like:
[255354.313411] 0: 28 00 01 84 54 90 00 00 10 00 00 00 00 00 00 00
(...T...........
[255354.313420] [0]: scst: scst_cmd_init_done:214:tag=62, lun=0, CDB
len=16, queue_type=1 (cmd ffff880102b4a568)
[255354.313443] [26358]: scst: scst_pre_parse:417:op_name <READ(10)>
(cmd ffff880102b4a3a0), direction=2 (expected 2, set yes),
transfer_len=16 (expected len 8192), flags=1
[255354.313420] [0]: scst_cmd_init_done:216:Recieving CDB:
[255354.313452] [8602]: scst: scst_xmit_response:3004:Xmitting data
for cmd ffff880102b49e48 (sg_cnt 0, sg ffff880132579f60, sg[0].page
ffffe200042b7180)
[255354.313457] [8604]: scst: scst_xmit_response:3004:Xmitting data
for cmd ffff880102b4a010 (sg_cnt 0, sg ffff8802e9806f60, sg[0].page
ffffe2000bc129c0)
[255354.313426] (h)___0__1__2__3__4__5__6__7__8__9__A__B__C__D__E__F
[255354.313426] 0: 28 00 01 bc 5d 10 00 00 10 00 00 00 00 00 00 00
(...]...........
[255354.313468] [26358]: scst: scst_pre_parse:417:op_name <READ(10)>
(cmd ffff880102b4a568), direction=2 (expected 2, set yes),
transfer_len=16 (expected len 8192), flags=1
[255354.313484] [8602]: scst: scst_xmit_response:3004:Xmitting data
for cmd ffff880102b4a1d8 (sg_cnt 0, sg ffff8802e98064c0, sg[0].page
ffffe2000bc633c0)
[255354.313551] [8604]: scst: scst_xmit_response:3004:Xmitting data
for cmd ffff880102b4a3a0 (sg_cnt 0, sg ffff88018a877060, sg[0].page
ffffe20004300200)
[255354.313556] [8602]: scst: scst_xmit_response:3004:Xmitting data
for cmd ffff880102b4a568 (sg_cnt 0, sg ffff880142581100, sg[0].page
ffffe20004066d40)
... and there's a section like:
[255354.310177] 0: 28 00 01 25 df 50 00 00 10 00 00 00 00 00 00 00
(..%.P..........
[255354.310177] [0]: scst: scst_cmd_init_done:214:tag=57, lun=0, CDB
len=16, queue_type=1 (cmd ffff8801642e2730)
[255354.310177] [0]: scst_cmd_init_done:216:Recieving CDB:
[255354.310177] (h)___0__1__2__3__4__5__6__7__8__9__A__B__C__D__E__F
[255354.310177] 0: 28 00 01 5e 22 c0 00 00 10 00 00 00 00 00 00 00
(..^"...........
[255354.310966] [26369]: scst: scst_pre_parse:417:op_name <READ(10)>
(cmd ffff880168a9e3a0), direction=2 (expected 2, set yes),
transfer_len=16 (expected len 8192), flags=1
[255354.310973] [26361]: scst: scst_pre_parse:417:op_name <READ(10)>
(cmd ffff880168a9e010), direction=2 (expected 2, set yes),
transfer_len=16 (expected len 8192), flags=1
[255354.310980] [26365]: scst: scst_pre_parse:417:op_name <READ(10)>
(cmd ffff880168a9e1d8), direction=2 (expected 2, set yes),
transfer_len=16 (expected len 8192), flags=1
[255354.310986] [26359]: scst: scst_pre_parse:417:op_name <READ(10)>
(cmd ffff880168a9de48), direction=2 (expected 2, set yes),
transfer_len=16 (expected len 8192), flags=1
...
[255354.311221] [8604]: scst: scst_xmit_response:3004:Xmitting data
for cmd ffff880168a9e1d8 (sg_cnt 0, sg ffff880173ca8060, sg[0].page
ffffe20004325d00)
[255354.311226] [8602]: scst: scst_xmit_response:3004:Xmitting data
for cmd ffff880168a9ee50 (sg_cnt 0, sg ffff880173ca8c40, sg[0].page
ffffe20005847ec0)
[255354.311233] [8604]: scst: scst_xmit_response:3004:Xmitting data
for cmd ffff880168a9dc80 (sg_cnt 0, sg ffff8802f0143c40, sg[0].page
ffffe2000bc04880)
[255354.311238] [8602]: scst: scst_xmit_response:3004:Xmitting data
for cmd ffff880168a9e568 (sg_cnt 0, sg ffff8802f08361a0, sg[0].page
ffffe2000bbf2400)
[255354.311242] [8604]: scst: scst_xmit_response:3004:Xmitting data
for cmd ffff880168a9d560 (sg_cnt 0, sg ffff88010acd74c0, sg[0].page
ffffe200047e7280)
... but, prior to that, messages are unreadably garbled, as in:
Aug 31 22:37:00 nameme kernel: t]9l ft48 r(09 ,83_5p s20 sg:303
_00s3]c_=cs _00ad0000e_003a6_0031_4(ea5 9arg )_2As_05s_8[7:c8[f3 _178
087gff0 .R nt]9i0tmpd1:ft st06s68 5i9[301602_106)o6 _001e4 0<s0 3>)0
.3E3_28a9102 pft0>e_o[.eo[<_2n05 98_0f8_i xpe1f0 D<98s np8one:21_0
30f3006=e_ ax R8gs=h62]= 2.pd_ pad555mlf
1_]f8=.05lf i7gxs_ac3 m_0c0:]5i3087[_ 5e sg,00[dc3e,_ 0[ ( 1<[t]F]
..eb 4t_ ah1,_1_]10.h45_]2,5__12C5o 37 d_.)b_g4f850s, t1e c80.ite.8pE
ue2.4f[.ft0 5c5_1effft 5530 f len=16, 5v03,em_cs4e 05fc78.5r5. n
,45ft45ff<if_:4fnd5c<ts54c078f9]_0c0a0efee04f[,1n 0 __5deff588=f82
.t)m9.8)9.8077=s _C 3 i8 .tlsf5_[0s0 (2u fu 4
5fco5fnr.n0a05_34f__4fd_4n Bs60fn4pB.tor7=s
_i8s7=0_.tl:c>l3e0.51_654.30350en.m C30 C3 e f.dtm0=2_1e0n]6qe d.>_
76 d=f _esr_tp 9_50.tnf50[cs.,
Aug 31 22:37:00 nameme kernel: e .0 5 B , 45 0<s382 3_
Aug 31 22:37:00 nameme kernel: c2< s0< cm38cf58.[f10 002< c3De
_)088m8 9c5299pected__F
Aug 31 22:37:00 nameme kernel: tran50 pt48)=8]=s59etl5pe4e6d)0c6
ei_2(e_<3cc_ ea51es_0_sras A >cmdtesafe4 3[m 3.rer7:[ 1b00s5
Aug 31 22:37:00 nameme kernel: ] 2a015ffs.35fff B__ a
6cmd9spre3se9_2e3806(3_csA_ 1 ns38ge0sre0
Aug 31 22:37:00 nameme kernel: <g data sf9_ _ 6d 0se5245f_26._2
.,76.9<g fe t_]t6:(E...:s5D.s0_<Rte46>0330B005]08s3 __ r40r._5x,<Re08
:2ec_ :06cs1_0ti1d l:253064enfe7]0 abd5 0f>196.t b 7.(008ni]
0s09.r650t, <24]__ s1=in03 s0p c2>>[4ein.1:ooD..ps210a>[25534_r6,:t
n4.]4(8 e2 .r c 2n1g9360]10>( 00 00 00 00[fd[2
[2g_re53 le_6c_md8t_ftc883tf03c m_0 :8r8fmd63m3:0] 25 c6>[2n_e:fa2e84_0
Aug 31 22:37:00 nameme kernel: c,
Aug 31 22:37:00 nameme kernel: .=0>5f=1s5=1d6_(de:d
2l_25:0edg25fm>ff40 l440 e,AFg l)AF0 0o[1088. 1aggB
0n=d9(16a.5oeX6csf00s0: ._, (=10es_(1 7 5c___oR5st_42p3d 7
C9d=5_:(3__7mD4_ 0m4_ed
04,5.,[s55.d4c,,25=,c8__q,[(meet9303_mr0ue9m0u_032__fy2se
Aug 31 22:37:00 nameme kernel: > y>i
... so other suggestions on trace settings would be appreciated.
Thanks,
Chris
Bart.
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Scst-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scst-devel