A few additional posibilities Balloon memory on the ESX host? (insufficient ram will artifically nuke a virtual quests performance)
These two issues will cause disk i/o performance issues in addition to the spindle count stuff. SAN issues we've had. Maxed processors on the SAN Oversubscribed ports on the switch path to the SAN. Steven Peck http://www.blkmtn.org On Tue, Feb 14, 2012 at 5:31 AM, Kurt Buff <[email protected]> wrote: > RE: spindles - I don't think that's my problem, as this all started > happening when i was still on a 3-node Lefthand cluster, with 12 > spindles, and now the LUNs on this server are split between the same > cluster and a new EMC VNXe 3100 with 6 spindles. I could be wrong, but > it seems unlikely. > > I'll take a look at the Toke tag on the report when I get into work > this morning. > > Thanks, > > Kurt > > On Mon, Feb 13, 2012 at 22:03, Brian Desmond <[email protected]> > wrote: > > Yes. Security tokens are stored in Paged Pool. When you get the token > bloat issue (well if you start approaching it), you will start seeing > issues on x86 application servers where they are running out of paged pool. > If you look at a report of paged pool consumers, you'll find the Toke tag > at the top. > > > > # of spindles is going to directly correlate to disk queue lengths and > latency. If you have 2 spindles which can do 100 IOPS each, and you are > throwing 225 IOPS at them, you will have a problem. If you add a third > spindle, now you have 75 IOPS head room. > > > > Thanks, > > Brian Desmond > > [email protected] > > > > w – 312.625.1438 | c – 312.731.3132 > > > > > > -----Original Message----- > > From: Kurt Buff [mailto:[email protected]] > > Sent: Monday, February 13, 2012 11:13 PM > > To: NT System Admin Issues > > Subject: Re: Picking up file server tuning again > > > > PSTs on file shares - it's been a while since I looked at that issue. > > > > Crappy drivers are a small possibility - it is a P2V of an old machine. > > > > I'm not sure that the number of spindles has anything to do with it, and > in any case there isn't anything I can do about that for a while. > > > > Can you explain what you mean by "large tokens"? Is that related to > token bloat in AD, or is it something else? > > > > Thanks, > > > > Kurt > > > > On Mon, Feb 13, 2012 at 19:25, Brian Desmond <[email protected]> > wrote: > >> Well, the % Interrupts/DPC Time/Kernel Mode CPU time isn't necessarily > going to be fixed by x64. It may very well mean you've got some crappy > drivers in play. > >> > >> The disk stuff indicates the disk is not fast enough to keep up with > demand. You can solve that with more spindles or faster spindles. > >> > >> Page Pool utilization will be resolved by x64 (or even x86 on 2008). > That's indicative of crappy drivers, large tokens, and/or people doing > things like using PSTs off file shares. > >> > >> Thanks, > >> Brian Desmond > >> [email protected] > >> > >> w – 312.625.1438 | c – 312.731.3132 > >> > >> > >> -----Original Message----- > >> From: Michael B. Smith [mailto:[email protected]] > >> Sent: Monday, February 13, 2012 6:18 PM > >> To: NT System Admin Issues > >> Subject: RE: Picking up file server tuning again > >> > >> Well, the kernel mode, paged pool, and interrupt time are items that > will be specifically reduced with an x64 OS. > >> > >> The I/O situation is indicative of disk queuing which is "hypervisor > related". Dunno how you optimize that in VMware, there are a number of > potentials in Hyper-V. > >> > >> Regards, > >> > >> Michael B. Smith > >> Consultant and Exchange MVP > >> http://TheEssentialExchange.com > >> > >> > >> -----Original Message----- > >> From: Kurt Buff [mailto:[email protected]] > >> Sent: Monday, February 13, 2012 5:33 PM > >> To: NT System Admin Issues > >> Subject: Re: Picking up file server tuning again > >> > >> It *is* a busy box, and migrating the iSCSI LUNs to a 64bit server is > >> something I've definitely considered. I have a Dell R310 with 16gb RAM > >> that I could use, but it's already got 9 active VMs, although they're > >> not heavy hitters. AFAICT, probably the highest-use machines on the > >> ESXi 4.1 box are the secondary DC (no FSMO roles, but does do DNS and > >> WINS) and the issuing CA box. > >> > >> It's currently a VM on what I believe to be an underpowered ESX 3.5 box > - I think it's possible that it's simply starved for resources on that ESX > box. > >> > >> I'm sure there's something out there like perfmon for VMware that I can > use to capture performance over time - I'd like to measure and analyze the > performance of the ESX 3.5 box while the backups are happening against the > file server. > >> > >> I'm also considering moving the Win2k3 file server VM to the ESX box > and seeing if the situation improves. > >> > >> Kurt > >> > >> On Mon, Feb 13, 2012 at 12:08, Michael B. Smith <[email protected]> > wrote: > >>> That's a busy box. I'd suggest moving to a 64-bit OS. > >>> > >>> Regards, > >>> > >>> Michael B. Smith > >>> Consultant and Exchange MVP > >>> http://TheEssentialExchange.com > >>> > >>> -----Original Message----- > >>> From: Kurt Buff [mailto:[email protected]] > >>> Sent: Monday, February 13, 2012 3:00 PM > >>> To: NT System Admin Issues > >>> Subject: Re: Picking up file server tuning again > >>> > >>> Ran PAL against the log. > >>> > >>> Um, wow. It's a freaking christmas tree - red and yellow all over the > >>> place in CPU and disk. > >>> > >>> Who should I be talking with to analyze this? > >>> > >>> A sample of the issues shown - all of which show up in more than one > >>> time slice - some in every or almost every slice: > >>> o- More than 50% Processor Utilization > >>> o- More than 30% privileged (kernel) mode CPU usage > >>> o- More than 2 packets are waiting in the output queue > >>> o- Greater than 25ms physical disk READ response times > >>> o- Greater than 25ms physical disk WRITE response times > >>> o- More than 80% of Pool Paged Kernel Memory Used > >>> o- More than 2 I/O's are waiting on the physical disk > >>> o- 20 (Processor(_Total)\DPC Rate) > >>> o- More than 30% Interrupt Time > >>> o- Greater than 1000 page inputs per second (Memory\Pages Input/sec) > >>> > >>> Some things that showed no alerts: > >>> o- Memory\Available MBytes > >>> o- Memory\Free System Page Table Entrie > >>> o- Memory\Pages/sec > >>> o- Memory\System Cache Resident Bytes > >>> o- Memory\Cache Bytes > >>> o- Memory\% Committed Bytes In Use > >>> o- Network Interface(*)\% Network Utilization > >>> MS TCP Loopback interface > >>> VMware Accelerated AMD PCNet Adapter > >>> VMware Accelerated AMD PCNet Adapter#1 > >>> o- Network Interface(*)\Packets Outbound Errors > >>> MS TCP Loopback interface > >>> VMware Accelerated AMD PCNet Adapter > >>> VMware Accelerated AMD PCNet Adapter#1 > >>> > >>> > >>> Kurt > >>> > >>> On Fri, Feb 10, 2012 at 16:04, Brian Desmond <[email protected]> > wrote: > >>>> Rather than trying to do this yourself, check out PAL - > http://pal.codeplex.com/. It will setup all the right counters for you > and crunch the data. > >>>> > >>>> Thanks, > >>>> Brian Desmond > >>>> [email protected] > >>>> > >>>> w – 312.625.1438 | c – 312.731.3132 > >>>> > >>>> -----Original Message----- > >>>> From: Kurt Buff [mailto:[email protected]] > >>>> Sent: Friday, February 10, 2012 4:43 PM > >>>> To: NT System Admin Issues > >>>> Subject: Picking up file server tuning again > >>>> > >>>> I'm getting back to monitoring my situation with the file server > again, and just finished a perfmon session covering the 3rd through the 7th > of this month. Simultaneously, I set up perfmon on the same workstation to > monitor the backup server. > >>>> > >>>> If anyone cares to help, I'd be deeply appreciative. > >>>> > >>>> I set up perfmon on a Win7 VM on an ESXi 4.1 host to take > measurements at 60 second intervals of a whole bunch of counters, many of > them probably just noise. > >>>> > >>>> I'll describe the history of the configuration first, however: > >>>> > >>>> The file server is a Win2k3 R2 VM running on a ESX 3.5 host with 16g > of RAM - it's one of 10 VMs, and is definitely the heaviest hitter in terms > of disk I/O. About 2.5-3 months ago we noticed that the time to completion > for the weekly full backups spiked dramatically. > >>>> > >>>> Prior to that time, the fulls would start around 7pm on a Friday, and > finish by about 7pm on Sunday. > >>>> > >>>> Now they take until Thursday or Friday to complete. > >>>> > >>>> This coincided with some changes to the environment: I had to move > >>>> the VM to a new host (it was a manual copy - we don't have vmotion > >>>> licensed and configured for these hosts) and at about that time I > >>>> also had to expand 2 of the 4 LUNS. Finally, the OS drive for the > >>>> VM on the old host was on a LUN on our Lefthand unit - I had to > >>>> migrate it to the local disk storage on the new home for the VM. The > >>>> 4 data drives for this VM are attached via the MSFT iSCSI client > >>>> running on the VM, not through VMWare's iSCSI client. So, at that > >>>> point, all of the LUNS were on the Lefthand SAN, which is a 3-node > >>>> cluster, and we use 2-way replication for all LUNS. The 2 LUNS that > >>>> were expanded went to 2tb or slightly beyond. The Lefthand has two > >>>> NSM 2060s and a P4300G2, with 6 and 8 disks each, respectively - a > >>>> total of 20 disks > >>>> > >>>> Since that time, I've also added in our EMC VNXe 3100 with 6 disks in > it in a RAID6 array. I mention this because this means that all of the file > systems on the VNXe are clean and defragged. > >>>> > >>>> Currently, I've migrated 3 of the 4 data LUNs for the VM to the EMC. > I made sure to align the partitions on the EMC to a megabyte boundary. > >>>> > >>>> So, to make this simpler to visualize, a little table: > >>>> > >>>> c: - local disk on ESX 3.5, 40gb, 23.6gb free > >>>> j: - iSCSI LUN on Lefthand, 2.5tb, 900gb free > >>>> k: - iSCSI LUN on VNXe, 1.98tb, 336gb free > >>>> l: - iSCSI LUN on VNXe, 1tb, 79gb free > >>>> m: - iSCSI LUN on VNXe 750gb, 425gb free > >>>> > >>>> I tried to capture separate disk queue stats for each LUN, but in > spite of selecting and adding each drive letter separately in the perfmon > interface, all I got was _Total. > >>>> > >>>> Selected stats are as follows: > >>>> > >>>> PhysicalDisk counters > >>>> Current disk queue length - average 0.483, maximum 33.000 Average > >>>> disk read queue length - 0.037, maximum 1.294 %disk time - average > >>>> 34.068, maximum 153.877 Average disk write queue length - average > >>>> 0.645, maximum 2.828 Average disk queue length - average 0.681, > >>>> maximum 3.078 > >>>> > >>>> I have more data on PhysicalDisk, and data on other objects, > including Memory, NetworkInterface, Paging File, Processor and Server Work > Queues. > >>>> > >>>> If anyone has thoughts, I'd surely like to hear them. > >>>> > >>>> Thanks, > >>>> > >>>> Kurt > >>>> > >>>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ > >>>> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > >>>> > >>>> --- > >>>> To manage subscriptions click here: > >>>> http://lyris.sunbelt-software.com/read/my_forums/ > >>>> or send an email to [email protected] > >>>> with the body: unsubscribe ntsysadmin > >>>> > >>>> > >>>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ > >>>> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > >>>> > >>>> --- > >>>> To manage subscriptions click here: > >>>> http://lyris.sunbelt-software.com/read/my_forums/ > >>>> or send an email to [email protected] > >>>> with the body: unsubscribe ntsysadmin > >>> > >>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ > >>> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > >>> > >>> --- > >>> To manage subscriptions click here: > >>> http://lyris.sunbelt-software.com/read/my_forums/ > >>> or send an email to [email protected] > >>> with the body: unsubscribe ntsysadmin > >>> > >>> > >>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ > >>> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > >>> > >>> --- > >>> To manage subscriptions click here: > >>> http://lyris.sunbelt-software.com/read/my_forums/ > >>> or send an email to [email protected] > >>> with the body: unsubscribe ntsysadmin > >> > >> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ > >> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > >> > >> --- > >> To manage subscriptions click here: > >> http://lyris.sunbelt-software.com/read/my_forums/ > >> or send an email to [email protected] > >> with the body: unsubscribe ntsysadmin > >> > >> > >> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ > >> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > >> > >> --- > >> To manage subscriptions click here: > >> http://lyris.sunbelt-software.com/read/my_forums/ > >> or send an email to [email protected] > >> with the body: unsubscribe ntsysadmin > >> > >> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ > >> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > >> > >> --- > >> To manage subscriptions click here: > >> http://lyris.sunbelt-software.com/read/my_forums/ > >> or send an email to [email protected] > >> with the body: unsubscribe ntsysadmin > > > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ < > http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > > > > --- > > To manage subscriptions click here: > http://lyris.sunbelt-software.com/read/my_forums/ > > or send an email to [email protected] > > with the body: unsubscribe ntsysadmin > > > > > > > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ > > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > > > > --- > > To manage subscriptions click here: > http://lyris.sunbelt-software.com/read/my_forums/ > > or send an email to [email protected] > > with the body: unsubscribe ntsysadmin > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > > --- > To manage subscriptions click here: > http://lyris.sunbelt-software.com/read/my_forums/ > or send an email to [email protected] > with the body: unsubscribe ntsysadmin > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ --- To manage subscriptions click here: http://lyris.sunbelt-software.com/read/my_forums/ or send an email to [email protected] with the body: unsubscribe ntsysadmin
