> On Fri, Nov 16, 2007 at 06:29:45AM -0800, Paul van
> den Bogaard wrote:
> > Result: had to reboot my T2000 by using the system
> console and powering it off and on again after
> waiting for more than 12 hours to see if it came
> back.
> 
> That's quite surprising -- and distressing. Did you
> get a dump? Do you have
> any more data from which we might be able to debug
> the problem?
> 
> > Sure too many probes enabled. However this result
> is lets say "surprising". Indeed very dangerous since
> database can be used in production environments. In
> my view this is not good.
> 
> That doesn't sound like too many probes enabled at
> all. We've enabled millions
> of probes without incident.
> 
> > dtrace -x dynvarsize=64m -x cleanrate=203 -qs lw.d
>    >LW/lw.out
> trace: failed to compile script lw.d: line 374:
> failed to grab process 14510
> 
> This may be that you're hitting the file descriptor
> limit. Try taking a look
> with truss on the dtrace(1M) process.

I think the processes that needed to be traced exited because they were done. 
It just took too long to compile (something that includes "attaching to probes")
> 
> > Questions: how do I make this workable. Meaning
> speedier in startup.  Without reducing my attention
>  to just one or two processes.
> We'd need to figure out where the system is spending
> its time -- DTrace is
> a great place to start.
> 
> > Question: why is my system so unresponsive (I
> waited more that 12 hours to see if it came back)
> forcing me to power off/on to get it alive and
> kicking again?
> 
> Let's work offline to try to sort this out.
> 
> > And please note that we are planning to add many,
> many more probes in PG source. However if  the above
> is the result I fear great fears ...
> 
> You're not enabling that many probes; I don't think
> that's the cause of
> either problem.
> 
> Adam
> 
> -- 
> Adam Leventhal, FishWorks
>                        http://blogs.sun.com/ahl
> ________________________
> dtrace-discuss mailing list
> [email protected]

Did some more tests and it looks like there is a relation between number of 
probes, size of shared memory and time-to-compile.

Using a PostgreSQL 8.3 beta 1 database on a T2000 (32GB) I varied the size of 
the shared memory, the number of processes and the number of probes. The shared 
memory test is sized by changing the number of so called shared_buffers. Each 
buffer is 8KB.

Timing is done in a simple way. When the BEGIN probe fires its action printf's 
a string. Output is redirected to a file. 
With a 

while ls -l dtrace.outputfile; do sleep 5; done 


I see the file size. When it changes from 0 to not zero I know the BEGIN probe 
fired. Sure there are some other things that influence this, but I expect to 
look into minutes of total time to get a better understanding of what is 
happening.

Results (all times in seconds):

50 probes enabled (5 probes for 10 different processes):

<code>
            N users     100     400    1200
     BUFFERS        -----------------------
       100000            25      30       30
       400000            45      50       50
     1200000            95     105     105
</code>

Changing the number of users has vary little effect. Changing the size of 
shared memory definitely has an effect.


300 probes enabled (30 probes for 10 different processes):
(beware the 400000 buffers and the 400 user tests are not present. Only looking 
at the "extremes", and added to these a little by increasing the number of 
buffers too)

<code>
          N users     100       1200
   BUFFERS       -----------------
     100000          125         155
   1200000          545         595           
   1600000          695         730        
  2000000        memory problems, scan rate shows up in vmstat -p.
</code>


1200 probes (30 probes for 40 different processes):

<code>
           N users   100     1200
   BUFFERS   ------------------
     100000         475      600
   1600000        2710    3055        
</code>
 

These numbers speak for themselves. The long time to compile has a relation 
with shared memory size and the number of probes that are enabled. Remaining 
question is why?


--
This message posted from opensolaris.org
_______________________________________________
dtrace-discuss mailing list
[email protected]

Reply via email to