Hi,

I've found the source of the problem. Sigar isn't zone aware so all the 
memory calculations are way off.
I confirmed it by patching sigar and recompiling.

My patch wouldn't have worked on vanilla Solaris so I did some more digging 
and found that Brendan Gregg actually tried to get this fixed in upstream 
sigar in 2012.
His patch handles vanilla Solaris as well but I don't have a Solaris 
machine to test on.
His patch also fixes something in the disk stats. I'm not sure how it might 
affect couchbase.

In any case, I'm rebuilding couchbase with an adapted version of his patch.
If all compiles well, I'll let you all know and make a pull request to 
couchbase/sigar.

Here is the patched version in case someone can test on 
Solaris: https://github.com/yruss972/sigar

Thanks,
Yonah

On Thursday, July 3, 2014 2:38:20 AM UTC+3, Aliaksey Kandratsenka wrote:
>
>
>
>
> On Wed, Jul 2, 2014 at 4:02 PM, Yonah Russ <[email protected] 
> <javascript:>> wrote:
>
>> Well I took a look at the scripting option and assuming you meant 
>> escript, it doesn't look like something I'll be picking up from scratch in 
>> an hour or so ;)
>>
>
> No I referred to your favorite scripting language. Like perl, ruby, tcl or 
> something else.
>  
>
>>
>> I hacked port_sigar as you suggested to print the reply values to stderr 
>> and ran it using the example.escript which is in the source directory ( I 
>> hope that was a reasonable thing to do).
>>
>> Here is the output from one of the couchbase servers:
>>
>> ./portsigar/example.escript
>> cpu_total_ms: 67285756633
>> cpu_idle_ms: 32306446002
>> swap_total: 8589934592
>> swap_used: 1717555200
>> swap_page_in: 5915867
>> swap_page_out: 87010188
>> mem_total: 4294967296
>> mem_used: 18446744073677623296
>> mem_actual_used: 18446744017909834968
>> mem_actual_free: 60094683944
>> escript: exception error: no match of right hand side value
>>                 
>>  <<2,0,0,0,40,3,0,0,217,42,139,170,15,0,0,0,178,62,157,133,7,0,
>>                   
>>  0,0,0,0,0,0,2,0,0,0,0,208,95,102,0,0,0,0,219,68,90,0,0,0,0,
>>                    0,140,...>>
>>
>> Obviously the mem_used and mem_actual_used numbers are way off.
>> I dug into it deeper and it seems the calculations made by sigar are 
>> wrong when running inside a zone.
>> I'll keep looking for a long term solution to that but the thing is that 
>> these calculations in sigar haven't changed in 3 years so what changed in 
>> 2.5.1 that caused the numbers in the interface to come out screwy?
>>
>
> You can find out by using git log on sigar and sigar_port. Both projects 
> are low "traffic" so you should be able to spot something that looks 
> solaris-specific. Most likely it's due to some change in sigar but I could 
> be wrong. I.e. maybe it's due to sigar_port asking for more stats.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to