Re: Picking up file server tuning again

Kurt Buff Mon, 13 Feb 2012 18:07:53 -0800

Thanks. I'll continue to poke around, and ask a few more questions.

Kurt


On Mon, Feb 13, 2012 at 16:18, Michael B. Smith <[email protected]> wrote:
> Well, the kernel mode, paged pool, and interrupt time are items that will be 
> specifically reduced with an x64 OS.
>
> The I/O situation is indicative of disk queuing which is "hypervisor 
> related". Dunno how you optimize that in VMware, there are a number of 
> potentials in Hyper-V.
>
> Regards,
>
> Michael B. Smith
> Consultant and Exchange MVP
> http://TheEssentialExchange.com
>
>
> -----Original Message-----
> From: Kurt Buff [mailto:[email protected]]
> Sent: Monday, February 13, 2012 5:33 PM
> To: NT System Admin Issues
> Subject: Re: Picking up file server tuning again
>
> It *is* a busy box, and migrating the iSCSI LUNs to a 64bit server is
> something I've definitely considered. I have a Dell R310 with 16gb RAM
> that I could use, but it's already got 9 active VMs, although they're
> not heavy hitters. AFAICT, probably the highest-use machines on the
> ESXi 4.1 box are the secondary DC (no FSMO roles, but does do DNS and
> WINS) and the issuing CA box.
>
> It's currently a VM on what I believe to be an underpowered ESX 3.5
> box - I think it's possible that it's simply starved for resources on
> that ESX box.
>
> I'm sure there's something out there like perfmon for VMware that I
> can use to capture performance over time - I'd like to measure and
> analyze the performance of the ESX 3.5 box while the backups are
> happening against the file server.
>
> I'm also considering moving the Win2k3 file server VM to the ESX box
> and seeing if the situation improves.
>
> Kurt
>
> On Mon, Feb 13, 2012 at 12:08, Michael B. Smith <[email protected]> wrote:
>> That's a busy box. I'd suggest moving to a 64-bit OS.
>>
>> Regards,
>>
>> Michael B. Smith
>> Consultant and Exchange MVP
>> http://TheEssentialExchange.com
>>
>> -----Original Message-----
>> From: Kurt Buff [mailto:[email protected]]
>> Sent: Monday, February 13, 2012 3:00 PM
>> To: NT System Admin Issues
>> Subject: Re: Picking up file server tuning again
>>
>> Ran PAL against the log.
>>
>> Um, wow. It's a freaking christmas tree - red and yellow all over the
>> place in CPU and disk.
>>
>> Who should I be talking with to analyze this?
>>
>> A sample of the issues shown - all of which show up in more than one
>> time slice - some in every or almost every slice:
>> o- More than 50% Processor Utilization
>> o- More than 30% privileged (kernel) mode CPU usage
>> o- More than 2 packets are waiting in the output queue
>> o- Greater than 25ms physical disk READ response times
>> o- Greater than 25ms physical disk WRITE response times
>> o- More than 80% of Pool Paged Kernel Memory Used
>> o- More than 2 I/O's are waiting on the physical disk
>> o- 20 (Processor(_Total)\DPC Rate)
>> o- More than 30% Interrupt Time
>> o- Greater than 1000 page inputs per second (Memory\Pages Input/sec)
>>
>> Some things that showed no alerts:
>> o- Memory\Available MBytes
>> o- Memory\Free System Page Table Entrie
>> o- Memory\Pages/sec
>> o- Memory\System Cache Resident Bytes
>> o- Memory\Cache Bytes
>> o- Memory\% Committed Bytes In Use
>> o- Network Interface(*)\% Network Utilization
>>     MS TCP Loopback interface
>>     VMware Accelerated AMD PCNet Adapter
>>     VMware Accelerated AMD PCNet Adapter#1
>> o- Network Interface(*)\Packets Outbound Errors
>>     MS TCP Loopback interface
>>     VMware Accelerated AMD PCNet Adapter
>>     VMware Accelerated AMD PCNet Adapter#1
>>
>>
>> Kurt
>>
>> On Fri, Feb 10, 2012 at 16:04, Brian Desmond <[email protected]> wrote:
>>> Rather than trying to do this yourself, check out PAL - 
>>> http://pal.codeplex.com/. It will setup all the right counters for you and 
>>> crunch the data.
>>>
>>> Thanks,
>>> Brian Desmond
>>> [email protected]
>>>
>>> w – 312.625.1438 | c   – 312.731.3132
>>>
>>> -----Original Message-----
>>> From: Kurt Buff [mailto:[email protected]]
>>> Sent: Friday, February 10, 2012 4:43 PM
>>> To: NT System Admin Issues
>>> Subject: Picking up file server tuning again
>>>
>>> I'm getting back to monitoring my situation with the file server again, and 
>>> just finished a perfmon session covering the 3rd through the 7th of this 
>>> month. Simultaneously, I set up perfmon on the same workstation to monitor 
>>> the backup server.
>>>
>>> If anyone cares to help, I'd be deeply appreciative.
>>>
>>> I set up perfmon on a Win7 VM on an ESXi 4.1 host to take measurements at 
>>> 60 second intervals of a whole bunch of counters, many of them probably 
>>> just noise.
>>>
>>> I'll describe the history of the configuration first, however:
>>>
>>> The file server is a Win2k3 R2 VM running on a ESX 3.5 host with 16g of RAM 
>>> - it's one of 10 VMs, and is definitely the heaviest hitter in terms of 
>>> disk I/O. About 2.5-3 months ago we noticed that the time to completion for 
>>> the weekly full backups spiked dramatically.
>>>
>>> Prior to that time, the fulls would start around 7pm on a Friday, and 
>>> finish by about 7pm on Sunday.
>>>
>>> Now they take until Thursday or Friday to complete.
>>>
>>> This coincided with some changes to the environment: I had to move the VM 
>>> to a new host (it was a manual copy - we don't have vmotion licensed and 
>>> configured for these hosts) and at about that time I also had to expand 2 
>>> of the 4 LUNS.  Finally, the OS drive for the VM on the old host was on a 
>>> LUN on our Lefthand unit - I had to migrate it to the local disk storage on 
>>> the new home for the VM. The 4 data drives for this VM are attached via the 
>>> MSFT iSCSI client running on the VM, not through VMWare's iSCSI client. So, 
>>> at that point, all of the LUNS were on the Lefthand SAN, which is a 3-node 
>>> cluster, and we use 2-way replication for all LUNS. The 2 LUNS that were 
>>> expanded went to 2tb or slightly beyond. The Lefthand has two NSM 2060s and 
>>> a P4300G2, with 6 and 8 disks each, respectively - a total of 20 disks
>>>
>>> Since that time, I've also added in our EMC VNXe 3100 with 6 disks in it in 
>>> a RAID6 array. I mention this because this means that all of the file 
>>> systems on the VNXe are clean and defragged.
>>>
>>> Currently, I've migrated 3 of the 4 data LUNs for the VM to the EMC. I made 
>>> sure to align the partitions on the EMC to a megabyte boundary.
>>>
>>> So, to make this simpler to visualize, a little table:
>>>
>>> c: - local disk on ESX 3.5, 40gb, 23.6gb free
>>> j: - iSCSI LUN on Lefthand, 2.5tb, 900gb free
>>> k: - iSCSI LUN on VNXe, 1.98tb, 336gb free
>>> l: - iSCSI LUN on VNXe, 1tb, 79gb free
>>> m: - iSCSI LUN on VNXe 750gb, 425gb free
>>>
>>> I tried to capture separate disk queue stats for each LUN, but in spite of 
>>> selecting and adding each drive letter separately in the perfmon interface, 
>>> all I got was _Total.
>>>
>>> Selected stats are as follows:
>>>
>>>     PhysicalDisk counters
>>> Current disk queue length - average 0.483, maximum 33.000 Average disk read 
>>> queue length - 0.037, maximum 1.294 %disk time - average 34.068, maximum 
>>> 153.877 Average disk write queue length - average 0.645, maximum 2.828 
>>> Average disk queue length - average 0.681, maximum 3.078
>>>
>>> I have more data on PhysicalDisk, and data on other objects, including 
>>> Memory, NetworkInterface, Paging File, Processor and  Server Work Queues.
>>>
>>> If anyone has thoughts, I'd surely like to hear them.
>>>
>>> Thanks,
>>>
>>> Kurt
>>>
>>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ 
>>> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>>>
>>> ---
>>> To manage subscriptions click here: 
>>> http://lyris.sunbelt-software.com/read/my_forums/
>>> or send an email to [email protected]
>>> with the body: unsubscribe ntsysadmin
>>>
>>>
>>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
>>> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>>>
>>> ---
>>> To manage subscriptions click here: 
>>> http://lyris.sunbelt-software.com/read/my_forums/
>>> or send an email to [email protected]
>>> with the body: unsubscribe ntsysadmin
>>
>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
>> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>>
>> ---
>> To manage subscriptions click here: 
>> http://lyris.sunbelt-software.com/read/my_forums/
>> or send an email to [email protected]
>> with the body: unsubscribe ntsysadmin
>>
>>
>> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
>> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>>
>> ---
>> To manage subscriptions click here: 
>> http://lyris.sunbelt-software.com/read/my_forums/
>> or send an email to [email protected]
>> with the body: unsubscribe ntsysadmin
>
> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>
> ---
> To manage subscriptions click here: 
> http://lyris.sunbelt-software.com/read/my_forums/
> or send an email to [email protected]
> with the body: unsubscribe ntsysadmin
>
>
> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>
> ---
> To manage subscriptions click here: 
> http://lyris.sunbelt-software.com/read/my_forums/
> or send an email to [email protected]
> with the body: unsubscribe ntsysadmin

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to [email protected]
with the body: unsubscribe ntsysadmin

Re: Picking up file server tuning again

Reply via email to