Re: Picking up file server tuning again

Steven Peck Tue, 14 Feb 2012 09:05:07 -0800

A few additional posibilities
Balloon memory on the ESX host? (insufficient ram will artifically nuke a
virtual quests performance)


These two issues will cause disk i/o performance issues in addition to the
spindle count stuff.
SAN issues we've had.  Maxed processors on the SAN
Oversubscribed ports on the switch path to the SAN.

Steven Peck
http://www.blkmtn.org


On Tue, Feb 14, 2012 at 5:31 AM, Kurt Buff <[email protected]> wrote:

> RE: spindles - I don't think that's my problem, as this all started
> happening when i was still on a 3-node Lefthand cluster, with 12
> spindles, and now the LUNs on this server are split between the same
> cluster and a new EMC VNXe 3100 with 6 spindles. I could be wrong, but
> it seems unlikely.
>
> I'll take a look at the Toke tag on the report when I get into work
> this morning.
>
> Thanks,
>
> Kurt
>
> On Mon, Feb 13, 2012 at 22:03, Brian Desmond <[email protected]>
> wrote:
> > Yes. Security tokens are stored in Paged Pool. When you get the token
> bloat issue (well if you start approaching it), you will start seeing
> issues on x86 application servers where they are running out of paged pool.
> If you look at a report of paged pool consumers, you'll find the Toke tag
> at the top.
> >
> > # of spindles is going to directly correlate to disk queue lengths and
> latency. If you have 2 spindles which can do 100 IOPS each, and you are
> throwing 225 IOPS at them, you will have a problem. If you add a third
> spindle, now you have 75 IOPS head room.
> >
> > Thanks,
> > Brian Desmond
> > [email protected]
> >
> > w – 312.625.1438 | c   – 312.731.3132
> >
> >
> > -----Original Message-----
> > From: Kurt Buff [mailto:[email protected]]
> > Sent: Monday, February 13, 2012 11:13 PM
> > To: NT System Admin Issues
> > Subject: Re: Picking up file server tuning again
> >
> > PSTs on file shares - it's been a while since I looked at that issue.
> >
> > Crappy drivers are a small possibility - it is a P2V of an old machine.
> >
> > I'm not sure that the number of spindles has anything to do with it, and
> in any case there isn't anything I can do about that for a while.
> >
> > Can you explain what you mean by "large tokens"? Is that related to
> token bloat in AD, or is it something else?
> >
> > Thanks,
> >
> > Kurt
> >
> > On Mon, Feb 13, 2012 at 19:25, Brian Desmond <[email protected]>
> wrote:
> >> Well, the % Interrupts/DPC Time/Kernel Mode CPU time isn't necessarily
> going to be fixed by x64. It may very well mean you've got some crappy
> drivers in play.
> >>
> >> The disk stuff indicates the disk is not fast enough to keep up with
> demand. You can solve that with more spindles or faster spindles.
> >>
> >> Page Pool utilization will be resolved by x64 (or even x86 on 2008).
> That's indicative of crappy drivers, large tokens, and/or people doing
> things like using PSTs off file shares.
> >>
> >> Thanks,
> >> Brian Desmond
> >> [email protected]
> >>
> >> w – 312.625.1438 | c   – 312.731.3132
> >>
> >>
> >> -----Original Message-----
> >> From: Michael B. Smith [mailto:[email protected]]
> >> Sent: Monday, February 13, 2012 6:18 PM
> >> To: NT System Admin Issues
> >> Subject: RE: Picking up file server tuning again
> >>
> >> Well, the kernel mode, paged pool, and interrupt time are items that
> will be specifically reduced with an x64 OS.
> >>
> >> The I/O situation is indicative of disk queuing which is "hypervisor
> related". Dunno how you optimize that in VMware, there are a number of
> potentials in Hyper-V.
> >>
> >> Regards,
> >>
> >> Michael B. Smith
> >> Consultant and Exchange MVP
> >> http://TheEssentialExchange.com
> >>
> >>
> >> -----Original Message-----
> >> From: Kurt Buff [mailto:[email protected]]
> >> Sent: Monday, February 13, 2012 5:33 PM
> >> To: NT System Admin Issues
> >> Subject: Re: Picking up file server tuning again
> >>
> >> It *is* a busy box, and migrating the iSCSI LUNs to a 64bit server is
> >> something I've definitely considered. I have a Dell R310 with 16gb RAM
> >> that I could use, but it's already got 9 active VMs, although they're
> >> not heavy hitters. AFAICT, probably the highest-use machines on the
> >> ESXi 4.1 box are the secondary DC (no FSMO roles, but does do DNS and
> >> WINS) and the issuing CA box.
> >>
> >> It's currently a VM on what I believe to be an underpowered ESX 3.5 box
> - I think it's possible that it's simply starved for resources on that ESX
> box.
> >>
> >> I'm sure there's something out there like perfmon for VMware that I can
> use to capture performance over time - I'd like to measure and analyze the
> performance of the ESX 3.5 box while the backups are happening against the
> file server.
> >>
> >> I'm also considering moving the Win2k3 file server VM to the ESX box
> and seeing if the situation improves.
> >>
> >> Kurt
> >>
> >> On Mon, Feb 13, 2012 at 12:08, Michael B. Smith <[email protected]>
> wrote:
> >>> That's a busy box. I'd suggest moving to a 64-bit OS.
> >>>
> >>> Regards,
> >>>
> >>> Michael B. Smith
> >>> Consultant and Exchange MVP
> >>> http://TheEssentialExchange.com
> >>>
> >>> -----Original Message-----
> >>> From: Kurt Buff [mailto:[email protected]]
> >>> Sent: Monday, February 13, 2012 3:00 PM
> >>> To: NT System Admin Issues
> >>> Subject: Re: Picking up file server tuning again
> >>>
> >>> Ran PAL against the log.
> >>>
> >>> Um, wow. It's a freaking christmas tree - red and yellow all over the
> >>> place in CPU and disk.
> >>>
> >>> Who should I be talking with to analyze this?
> >>>
> >>> A sample of the issues shown - all of which show up in more than one
> >>> time slice - some in every or almost every slice:
> >>> o- More than 50% Processor Utilization
> >>> o- More than 30% privileged (kernel) mode CPU usage
> >>> o- More than 2 packets are waiting in the output queue
> >>> o- Greater than 25ms physical disk READ response times
> >>> o- Greater than 25ms physical disk WRITE response times
> >>> o- More than 80% of Pool Paged Kernel Memory Used
> >>> o- More than 2 I/O's are waiting on the physical disk
> >>> o- 20 (Processor(_Total)\DPC Rate)
> >>> o- More than 30% Interrupt Time
> >>> o- Greater than 1000 page inputs per second (Memory\Pages Input/sec)
> >>>
> >>> Some things that showed no alerts:
> >>> o- Memory\Available MBytes
> >>> o- Memory\Free System Page Table Entrie
> >>> o- Memory\Pages/sec
> >>> o- Memory\System Cache Resident Bytes
> >>> o- Memory\Cache Bytes
> >>> o- Memory\% Committed Bytes In Use
> >>> o- Network Interface(*)\% Network Utilization
> >>>     MS TCP Loopback interface
> >>>     VMware Accelerated AMD PCNet Adapter
> >>>     VMware Accelerated AMD PCNet Adapter#1
> >>> o- Network Interface(*)\Packets Outbound Errors
> >>>     MS TCP Loopback interface
> >>>     VMware Accelerated AMD PCNet Adapter
> >>>     VMware Accelerated AMD PCNet Adapter#1
> >>>
> >>>
> >>> Kurt
> >>>
> >>> On Fri, Feb 10, 2012 at 16:04, Brian Desmond <[email protected]>
> wrote:
> >>>> Rather than trying to do this yourself, check out PAL -
> http://pal.codeplex.com/. It will setup all the right counters for you
> and crunch the data.
> >>>>
> >>>> Thanks,
> >>>> Brian Desmond
> >>>> [email protected]
> >>>>
> >>>> w – 312.625.1438 | c   – 312.731.3132
> >>>>
> >>>> -----Original Message-----
> >>>> From: Kurt Buff [mailto:[email protected]]
> >>>> Sent: Friday, February 10, 2012 4:43 PM
> >>>> To: NT System Admin Issues
> >>>> Subject: Picking up file server tuning again
> >>>>
> >>>> I'm getting back to monitoring my situation with the file server
> again, and just finished a perfmon session covering the 3rd through the 7th
> of this month. Simultaneously, I set up perfmon on the same workstation to
> monitor the backup server.
> >>>>
> >>>> If anyone cares to help, I'd be deeply appreciative.
> >>>>
> >>>> I set up perfmon on a Win7 VM on an ESXi 4.1 host to take
> measurements at 60 second intervals of a whole bunch of counters, many of
> them probably just noise.
> >>>>
> >>>> I'll describe the history of the configuration first, however:
> >>>>
> >>>> The file server is a Win2k3 R2 VM running on a ESX 3.5 host with 16g
> of RAM - it's one of 10 VMs, and is definitely the heaviest hitter in terms
> of disk I/O. About 2.5-3 months ago we noticed that the time to completion
> for the weekly full backups spiked dramatically.
> >>>>
> >>>> Prior to that time, the fulls would start around 7pm on a Friday, and
> finish by about 7pm on Sunday.
> >>>>
> >>>> Now they take until Thursday or Friday to complete.
> >>>>
> >>>> This coincided with some changes to the environment: I had to move
> >>>> the VM to a new host (it was a manual copy - we don't have vmotion
> >>>> licensed and configured for these hosts) and at about that time I
> >>>> also had to expand 2 of the 4 LUNS.  Finally, the OS drive for the
> >>>> VM on the old host was on a LUN on our Lefthand unit - I had to
> >>>> migrate it to the local disk storage on the new home for the VM. The
> >>>> 4 data drives for this VM are attached via the MSFT iSCSI client
> >>>> running on the VM, not through VMWare's iSCSI client. So, at that
> >>>> point, all of the LUNS were on the Lefthand SAN, which is a 3-node
> >>>> cluster, and we use 2-way replication for all LUNS. The 2 LUNS that
> >>>> were expanded went to 2tb or slightly beyond. The Lefthand has two
> >>>> NSM 2060s and a P4300G2, with 6 and 8 disks each, respectively - a
> >>>> total of 20 disks
> >>>>
> >>>> Since that time, I've also added in our EMC VNXe 3100 with 6 disks in
> it in a RAID6 array. I mention this because this means that all of the file
> systems on the VNXe are clean and defragged.
> >>>>
> >>>> Currently, I've migrated 3 of the 4 data LUNs for the VM to the EMC.
> I made sure to align the partitions on the EMC to a megabyte boundary.
> >>>>
> >>>> So, to make this simpler to visualize, a little table:
> >>>>
> >>>> c: - local disk on ESX 3.5, 40gb, 23.6gb free
> >>>> j: - iSCSI LUN on Lefthand, 2.5tb, 900gb free
> >>>> k: - iSCSI LUN on VNXe, 1.98tb, 336gb free
> >>>> l: - iSCSI LUN on VNXe, 1tb, 79gb free
> >>>> m: - iSCSI LUN on VNXe 750gb, 425gb free
> >>>>
> >>>> I tried to capture separate disk queue stats for each LUN, but in
> spite of selecting and adding each drive letter separately in the perfmon
> interface, all I got was _Total.
> >>>>
> >>>> Selected stats are as follows:
> >>>>
> >>>>     PhysicalDisk counters
> >>>> Current disk queue length - average 0.483, maximum 33.000 Average
> >>>> disk read queue length - 0.037, maximum 1.294 %disk time - average
> >>>> 34.068, maximum 153.877 Average disk write queue length - average
> >>>> 0.645, maximum 2.828 Average disk queue length - average 0.681,
> >>>> maximum 3.078
> >>>>
> >>>> I have more data on PhysicalDisk, and data on other objects,
> including Memory, NetworkInterface, Paging File, Processor and  Server Work
> Queues.
> >>>>
> >>>> If anyone has thoughts, I'd surely like to hear them.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Kurt
> >>>>
> >>>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~
> >>>> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
> >>>>
> >>>> ---
> >>>> To manage subscriptions click here:
> >>>> http://lyris.sunbelt-software.com/read/my_forums/
> >>>> or send an email to [email protected]
> >>>> with the body: unsubscribe ntsysadmin
> >>>>
> >>>>
> >>>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~
> >>>> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
> >>>>
> >>>> ---
> >>>> To manage subscriptions click here:
> >>>> http://lyris.sunbelt-software.com/read/my_forums/
> >>>> or send an email to [email protected]
> >>>> with the body: unsubscribe ntsysadmin
> >>>
> >>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~
> >>> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
> >>>
> >>> ---
> >>> To manage subscriptions click here:
> >>> http://lyris.sunbelt-software.com/read/my_forums/
> >>> or send an email to [email protected]
> >>> with the body: unsubscribe ntsysadmin
> >>>
> >>>
> >>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~
> >>> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
> >>>
> >>> ---
> >>> To manage subscriptions click here:
> >>> http://lyris.sunbelt-software.com/read/my_forums/
> >>> or send an email to [email protected]
> >>> with the body: unsubscribe ntsysadmin
> >>
> >> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~
> >> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
> >>
> >> ---
> >> To manage subscriptions click here:
> >> http://lyris.sunbelt-software.com/read/my_forums/
> >> or send an email to [email protected]
> >> with the body: unsubscribe ntsysadmin
> >>
> >>
> >> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~
> >> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
> >>
> >> ---
> >> To manage subscriptions click here:
> >> http://lyris.sunbelt-software.com/read/my_forums/
> >> or send an email to [email protected]
> >> with the body: unsubscribe ntsysadmin
> >>
> >> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~
> >> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
> >>
> >> ---
> >> To manage subscriptions click here:
> >> http://lyris.sunbelt-software.com/read/my_forums/
> >> or send an email to [email protected]
> >> with the body: unsubscribe ntsysadmin
> >
> > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <
> http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
> >
> > ---
> > To manage subscriptions click here:
> http://lyris.sunbelt-software.com/read/my_forums/
> > or send an email to [email protected]
> > with the body: unsubscribe ntsysadmin
> >
> >
> >
> > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
> > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
> >
> > ---
> > To manage subscriptions click here:
> http://lyris.sunbelt-software.com/read/my_forums/
> > or send an email to [email protected]
> > with the body: unsubscribe ntsysadmin
>
> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>
> ---
> To manage subscriptions click here:
> http://lyris.sunbelt-software.com/read/my_forums/
> or send an email to [email protected]
> with the body: unsubscribe ntsysadmin
>
>

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to [email protected]
with the body: unsubscribe ntsysadmin

Re: Picking up file server tuning again

Reply via email to