Maksim Yevmenkin wrote this message on Fri, Oct 11, 2013 at 15:39 -0700:
> > On Oct 11, 2013, at 2:52 PM, John-Mark Gurney <[email protected]> wrote:
> > 
> > Maksim Yevmenkin wrote this message on Fri, Oct 11, 2013 at 11:17 -0700:
> >> i would like to submit the attached bioq patch for review and
> >> comments. this is proof of concept. it helps with smoothing disk read
> >> service times and arrear to eliminates outliers. please see attached
> >> pictures (about a week worth of data)
> >> 
> >> - c034 "control" unmodified system
> >> - c044 patched system
> > 
> > Can you describe how you got this data?  Were you using the gstat
> > code or some other code?
> 
> Yes, it's basically gstat data. 

The reason I ask this is that I don't think the data you are getting
from gstat is what you think you are...  It accumulates time for a set
of operations and then divides by the count...  So I'm not sure if the
stat improvements you are seeing are as meaningful as you might think
they are...

> > Also, was your control system w/ the patch, but w/ the sysctl set to
> > zero to possibly eliminate any code alignment issues?
> 
> Both systems use the same code base and build. Patched system has patch 
> included, "control" system does not have the patch. I can rerun my tests with 
> sysctl set to zero and use it as "control". So, the answer to your question 
> is "no". 

I don't believe the code would make a difference, but more wanted to
know what control was...

> >> graphs show max/avg disk read service times for both systems across 36
> >> spinning drives. both systems are relatively busy serving production
> >> traffic (about 10 Gbps at peak). grey shaded areas on the graphs
> >> represent time when systems are refreshing their content, i.e. disks
> >> are both reading and writing at the same time.
> > 
> > Can you describe why you think this change makes an improvement?  Unless
> > you're running 10k or 15k RPM drives, 128 seems like a large number.. as
> > that's about halve number of IOPs that a normal HD handles in a second..
> 
> Our (Netflix) load is basically random disk io. We have tweaked the system to 
> ensure that our io path is "wide" enough, I.e. We read 1mb per disk io for 
> majority of the requests. However offsets we read from are all over the 
> place. It appears that we are getting into situation where larger offsets are 
> getting delayed because smaller offsets are "jumping" ahead of them. Forcing 
> bioq insert tail operation and effectively moving insertion point seems to 
> help avoiding getting into this situation. And, no. We don't use 10k or 15k 
> drives. Just regular enterprise 7200 sata drives. 

I assume that the 1mb reads are then further broken up into 8 128kb
reads? so it's more like every 16 reads in your work load that you
insert the "ordered" io...

I want to make sure that we choose the right value for this number..
What number of IOPs are you seeing?

> > I assume you must be regularly seeing queue depths of 128+ for this
> > code to make a difference, do you see that w/ gstat?
> 
> No, we don't see large (128+) queue sizes in gstat data. The way I see it, we 
> don't have to have deep queue here. We could just have a steady stream of io 
> requests where new, smaller, offsets consistently "jumping" ahead of older, 
> larger offset. In fact gstat data show shallow queue of 5 or less items.

Sorry, I miss read the patch the first time...  After rereading it,
the short summary is that if there hasn't been an ordered bio
(bioq_insert_tail) after 128 requests, the next request will be
"ordered"...

> > Also, do you see a similar throughput of the system?
> 
> Yes. We do see almost identical throughput from both systems.  I have not 
> pushed the system to its limit yet, but having much smoother disk read 
> service time is important for us because we use it as one of the components 
> of system health metrics. We also need to ensure that disk io request is 
> actually dispatched to the disk in a timely manner. 

Per above, have you measured at the application layer that you are
getting better latency times on your reads?  Maybe by doing a ktrace
of the io, and calculating times between read and return or something
like that...

Have you looked at the geom disk schedulers work that Luigi did a few
years back?  There have been known issues w/ our io scheduler for a
long time...  If you search the mailing lists, you'll see lots of
reports from some processes starving out others, probably due to a
similar issue...  I've seen similar unfair behavior between processes,
but spend time tracking it down...

It does look like a good improvement though...

Thanks for the work!

-- 
  John-Mark Gurney                              Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[email protected]"

Reply via email to