Thanks so much to everyone that replied to my post, here is what I did notice 
while trying cooltst utility (http://cooltools.sunsource.net/cooltst/)


wstest[root] cat cooltst.err
Minimum observation interval is 10 seconds
cooltst 3.0.1 executed at  on wstest/Solaris/UltraSPARC-T1
runtime=1, interval=10
measure_cpu=1, cooltst=/opt/cooltst_v3.01, examine=MATLAB

Workload spike analysis:
Highest thread was 21455/2 (MATLAB) = 3.0%
2008-05-20 11:05:17: MATLAB 1.5% vs. top thread (MATLAB) 1.5%, total 3.2%
2008-05-20 11:05:28: MATLAB 2.1% vs. top thread (MATLAB) 2.1%, total 3%
2008-05-20 11:05:38: MATLAB 2.5% vs. top thread (MATLAB) 2.5%, total 2.9%
2008-05-20 11:05:49: MATLAB 2.8% vs. top thread (MATLAB) 2.8%, total 2.9%
2008-05-20 11:05:59: MATLAB 2.9% vs. top thread (MATLAB) 2.9%, total 2.7%
2008-05-20 11:06:10: MATLAB 3.0% vs. top thread (MATLAB) 3.0%, total 2.8%
In 6 out of 6 observation intervals, MATLAB was the top thread
Internal system type code: solaris.t1. Ver detail: 4.10
wstest[root]


wstest[root] cat cooltst.out
CoolThreads Selection Tool (cooltst) version 3.0.1
     Copyright  2008 Sun Microsystems, Inc. All rights reserved
     Use is subject to license terms.

Cooltst observes a running workload and applies various heuristics
to assess whether that workload may be suitable for a Sun Fire
T1000/T2000/T5x20 system, to help you judge how much effort to put
into a feasibility study which might include porting, prototyping,
and/or performance measurement of your applications. Cooltst is
NOT a system sizing or capacity planning tool, and the rough
approximations used internally in cooltst should not substitute
for detailed performance analysis.


                       System Configuration

Host name                       wstest
System name                     SUNW,Sun-Fire-T200
Effective UID                   0
Cooltst version                 3.0.1
OS                              Solaris
OS release                      5.10
OS version                      Generic_127111-11
Distro                          Solaris
BIOS/PROM                       OBP 4.25.0 2006/11/07 23:24
Memory                          16376 MB
Chip                            UltraSPARC-T1
MHz                             1200
Architecture                    SPARC
# of Virtual CPUs               32
    P0: 1200 MHz UltraSPARC-T1
    P1: 1200 MHz UltraSPARC-T1
    P2: 1200 MHz UltraSPARC-T1
    P3: 1200 MHz UltraSPARC-T1
    P4: 1200 MHz UltraSPARC-T1
    P5: 1200 MHz UltraSPARC-T1
    P6: 1200 MHz UltraSPARC-T1
    P7: 1200 MHz UltraSPARC-T1
    P8: 1200 MHz UltraSPARC-T1
    P9: 1200 MHz UltraSPARC-T1
    P10: 1200 MHz UltraSPARC-T1
    P11: 1200 MHz UltraSPARC-T1
    P12: 1200 MHz UltraSPARC-T1
    P13: 1200 MHz UltraSPARC-T1
    P14: 1200 MHz UltraSPARC-T1
    P15: 1200 MHz UltraSPARC-T1
    P16: 1200 MHz UltraSPARC-T1
    P17: 1200 MHz UltraSPARC-T1
    P18: 1200 MHz UltraSPARC-T1
    P19: 1200 MHz UltraSPARC-T1
    P20: 1200 MHz UltraSPARC-T1
    P21: 1200 MHz UltraSPARC-T1
    P22: 1200 MHz UltraSPARC-T1
    P23: 1200 MHz UltraSPARC-T1
    P24: 1200 MHz UltraSPARC-T1
    P25: 1200 MHz UltraSPARC-T1
    P26: 1200 MHz UltraSPARC-T1
    P27: 1200 MHz UltraSPARC-T1
    P28: 1200 MHz UltraSPARC-T1
    P29: 1200 MHz UltraSPARC-T1
    P30: 1200 MHz UltraSPARC-T1
    P31: 1200 MHz UltraSPARC-T1
OS release detail:
Solaris 10 6/06 s10s_u2wos_09a SPARC Copyright 2006 Sun Microsystems, Inc.  All 
Rights Reserved. Use is subject to license terms. Assembled 09 June 2006

                       Workload Measurements

Observed system for             1 min
          in intervals of        10 sec
Cycles                          1728826254700
Instructions                    14004912581
CPI                             123.44 **
FP instructions                 144384919
Emulated FP instructions        100621
FP Percentage                    1.0%
The following applies to the measurement interval with the
busiest single thread or process:
Peak thread utilization at      2008-05-20 11:06:10
    Corresponding file name      1211295970
    CPU utilization                3.9%
    Command                      MATLAB
    PID/LWPID                    21455/2
    Thread utilization           3.0%
More detail on processes and threads is in data/process.out

**Cycles per Instruction (CPI) is not comparable between UltraSPARC
T1 and T2 processors and conventional processors. Conventional
processors execute an idle loop when there is no work to do, so
CPI may be artificially low, especially when the system is
somewhat idle. The UltraSPARC T1 and T2 "park" idle threads,
consuming no energy, when there is no work to do, so CPI may
be artificially high, especially when the system is somewhat idle.

                       Advice

During the observation of the highest utilization thread, a fairly
low overall CPU utilization of 3.9375% was seen. Are you sure that
the workload of interest was running on the system during the time
cooltst was running? If not, please run cooltst again while your
workload is active.

If your workload was running during this observation, then take
cooltst's advice with this caveat. If you expect your workload to
increase to higher levels, do you expect it to do so by adding
additional threads, as is common, or do you expect it to add more
work to the existing single thread, as sometimes happens? If your
response time criteria are being met, then even if a single thread
is responsible for most of your CPU consumption, you should still
get acceptable performance from a a system based on the UltraSPARC
T1 processor (i.e. Sun Fire/SPARC Enterprise T1000, T2000, Sun Blade
T6300) or a system based on the UltraSPARC T2 processor (i.e. SPARC
Enterprise T5120,  T5220, Sun Blade T6320), with  excess throughput
capacity for future growth. But if response time  is marginal and
workload growth is expected to be in a single thread, then a CMT
system may not be appropriate.


Floating Point                 YELLOW
     Observed floating point content was marginal for an
     UltraSPARC T1 processor. You may proceed with your
     evaluation of UltraSPARC T1 for your workload, watching
     the floating point usage carefully. Or you may instead
     consider an UltraSPARC T2 processor which can handle
     floating point heavy workloads.

Parallelism                    GREEN
     The observed workload has sufficient threads of execution to
     efficiently utilize the multiple cores and threads of an
     UltraSPARC T1 or UltraSPARC T2 processor.
wstest[root]


based on that tool my system is not that much utilized not abused, MATLAB is 
cerainly using floating point calculations, but still why its so bad in terms 
of 
how long it runs comparing to a PC and why it is still slower than V240?

I was seriously hoping to see if there is way using dtrace to look into this 
and 
have some more ideas in terms of what is going on and what is the limitation.

Thanks so much for everyone's help.

Regards,

Chris


On Tue, 20 May 2008, Vladimir Marek wrote:

> Hi,
>
>> Hello folks, I have an issue where developer wrote a code and it runs 10 
>> seconds
>> on his PC, then he moved it to T2000 (16GB RAM, 1.2GHz, 8 CORE, Solaris 10) 
>> and
>> the same process took 73 seconds. Then he took his code to V240 server (8GB 
>> RAM,
>> 1GHz, 2CPU's, Solaris 8) and that process completed in around 50 seconds.
>
> Have you considered using some sort of performance analysis on your code
> ? Sun Studio has excellent tools.
>
> http://developers.sun.com/solaris/articles/analyzer_qs.html
>
> Gcc uses utility called 'gprof'.
>
> Those are the tools especially designed to find performance issues in
> the code.
>
>
>
>> We are suspecting that maybe the single Floating-point inside that T2000 is
>> causing this problem? How would I troubleshoot this problem?
>
> Is the application using floating point heavily ? Is it multi-threaded ?
>
>
>> I was thinking to jump into dtrace as everyone is saying its so great
>
> Afternoon nap is also great :)
>
>
>
>> but I am not sure where to start in trying to troubleshoot this. Its
>> to late for me to go to take dtrace training class at this moment as
>> it will take weeks for me to get in, and I was looking around on the
>> internet and there are many different examples but I am not sure which
>> one would be the right for troubleshooting this issue with T2000.
>
> Is the developer machine running also Solaris ? If not, I would
> recommend finding tool which can be found on both systems, so you can
> compare results easily (gcc will be probably on both systems)
>
>
>> It does not make sense that two CPU machine that is slower in speed and has 
>> less
>> RAM outperforms T2000 which has a single CPU but its faster, not only that 
>> its
>> running Solaris 10 which in theory should be performing better, just does not
>> make any sense...
>
> Less gigahertz does not mean slower machine. T2000 is good at executing
> multithread, non floating point arithmetic heavy code.
>
>
>> Any suggestions or pointers would be greatly appreciated.
>
> Use profiling tools designed for the task.
>
> Hope this helps
>
> --
>       Vlad
>
_______________________________________________
dtrace-discuss mailing list
[email protected]

Reply via email to