[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089721#comment-13089721
 ] 

Hudson commented on CASSANDRA-2868:
---

Integrated in Cassandra-0.7 #543 (See 
[https://builds.apache.org/job/Cassandra-0.7/543/])
work around native memory leak in com.sun.management.GarbageCollectorMXBean
patch by brandonwilliams and jbellis for CASSANDRA-2868

brandonwilliams : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1160879
Files : 
* /cassandra/branches/cassandra-0.7/CHANGES.txt
* 
/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/GCInspector.java


 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.7.9, 0.8.5

 Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085850#comment-13085850
 ] 

Jonathan Ellis commented on CASSANDRA-2868:
---

dirty working directory.  GCI is the only relevant file.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.5

 Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-16 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086041#comment-13086041
 ] 

Brandon Williams commented on CASSANDRA-2868:
-

+1 to GCI changes.  Also, it is indeed possible to get 1 with a tiny heap.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.5

 Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086100#comment-13086100
 ] 

Hudson commented on CASSANDRA-2868:
---

Integrated in Cassandra-0.8 #282 (See 
[https://builds.apache.org/job/Cassandra-0.8/282/])
work around native memory leak in com.sun.management.GarbageCollectorMXBean
patch by brandonwilliams and jbellis for CASSANDRA-2868

jbellis : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1158490
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/GCInspector.java


 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.5

 Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-16 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086102#comment-13086102
 ] 

Jeremiah Jordan commented on CASSANDRA-2868:


Can we get this in 0.7.X as well?

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.5

 Attachments: 2868-v1.txt, 2868-v2.txt, 2868-v3.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-09 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081834#comment-13081834
 ] 

Brandon Williams commented on CASSANDRA-2868:
-

bq. Wouldn't it be worth indicating that how many collection have been done 
since last log message if it's  1, since it can (be  1).

The only reason I added count tracking was to prevent it from firing when there 
were no GCs (the api is flakey.)  I've never actually been able to get  1 to 
happen, but we can add it to the logging.

bq. IMO the duration-based thresholds are hard to reason about here, where 
we're dealing w/ summaries and not individual GC results.

We are dealing with individual GCs at least 99% of the time in practice.  The 
worst case is 1 GC inflates the gctime enough that we errantly log when it's 
not needed, but I imagine to trigger that you would have to be in a gc pressure 
situation already.

bq. I think I'd rather have something like the dropped messages logger, where 
every N seconds we log the summary we get from the mbean.

That seems like it could a lot of noise since GC is constantly happening.

bq. The flushLargestMemtables/reduceCacheSizes stuff should probably be 
removed. 

I think the logic there is still sound (Did we just do a CMS? Is the heap 
still 80% full?) and it seems to work as well as it always has.



 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.4

 Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073636#comment-13073636
 ] 

Sylvain Lebresne commented on CASSANDRA-2868:
-

Comments on v2:
* Couldn't we estimate the reclaimed size by recording the last memory used 
(that would need to be the first thing we do in logGCResults so that we record 
it each time) ?
* Wouldn't it be worth indicating that how many collection have been done since 
last log message if it's  1, since it can (be  1).
* Nit: especially if we decide to keep the last memory used, it may be more 
efficient (in cleaner imho) to have just one HashMap of string - GCInfo where 
GCInfo would be a small struct with times, counts and usedMemory. Not that it 
is very performance sensitive... 

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.3

 Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-08-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073661#comment-13073661
 ] 

Jonathan Ellis commented on CASSANDRA-2868:
---

bq. Couldn't we estimate the reclaimed size

Well, not really, what we'd have is difference in size between last time it 
was called, and now which isn't all that close to amount reclaimed by a 
specific GC.

bq. Wouldn't it be worth indicating that how many collection have been done 
since last log message

IMO the duration-based thresholds are hard to reason about here, where we're 
dealing w/ summaries and not individual GC results.  I think I'd rather have 
something like the dropped messages logger, where every N seconds we log the 
summary we get from the mbean.

The flushLargestMemtables/reduceCacheSizes stuff should probably be removed. :(

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.3

 Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-31 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073409#comment-13073409
 ] 

Brandon Williams commented on CASSANDRA-2868:
-

I created three isolated nodes, all with a hack of setting the inspector 
interval to 1ms applied (not the tightest loop, but good enough and easy.)  One 
of the nodes had the inspector disabled entirely (the control), one was 
vanilla, and one had v2 applied.  After starting them up with a 128M heap and 
letting them run for a few minutes, here are the results:

||version||resident||
|control|72M|
|patched|72M|
|vanilla|540M|

I think it's safe to say java.lang.management doesn't share the leak.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.3

 Attachments: 2868-v1.txt, 2868-v2.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-29 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073006#comment-13073006
 ] 

Jeremiah Jordan commented on CASSANDRA-2868:


Depending how long the rewrite is going to take, can we get the config file 
option to disable gc inspector into a new 0.7.X and 0.8.X release?

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.3

 Attachments: 2868-v1.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-27 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071895#comment-13071895
 ] 

Jonathan Ellis commented on CASSANDRA-2868:
---

Thanks, Chris.  We'll work on rewriting GCInspector to use the 
java.lang.management api instead, unless you have time to take a stab at that.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Daniel Doubleday
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.8.3

 Attachments: 2868-v1.txt, 48hour_RES.png, 
 low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-18 Thread Daniel Doubleday (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066835#comment-13066835
 ] 

Daniel Doubleday commented on CASSANDRA-2868:
-

Looks good to me. Guess cassandra should just disable the inspector for now 
(probably make it jmx'able to start it manually)

Thu Jul 14 09:39:26 CEST 2011: [anon]: 3234068
Thu Jul 14 17:22:45 CEST 2011: [anon]: 3266888
Fri Jul 15 09:33:53 CEST 2011: [anon]: 3269160
Mon Jul 18 09:54:29 CEST 2011: [anon]: 3270188

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor
 Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-16 Thread Daniel Doubleday (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066372#comment-13066372
 ] 

Daniel Doubleday commented on CASSANDRA-2868:
-

Yes - we did disable the GCInspector.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor
 Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-16 Thread Zhu Han (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066376#comment-13066376
 ] 

Zhu Han commented on CASSANDRA-2868:


Got it!

Do you have any idea why only some of us reports the problem?

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor
 Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-16 Thread Daniel Doubleday (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066392#comment-13066392
 ] 

Daniel Doubleday commented on CASSANDRA-2868:
-

Well either it's environment specific or (more likely) others didn't notice / 
care because they have enough memory and/or restart the nodes often enough.

We have 16GB of RAM and run Cassandra with 3GB. Within one month we loose ~3GB 
(13GB - 10GB) files system cache because of the mem leak. Looking at our 
graphs I can't really tell a difference performance wise. So I guess only 
people with weaker servers (less memory headroom) will really notice. We 
noticed only because we got the system oom on a cluster that's not critical and 
which we didn't really monitor.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor
 Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-15 Thread Daniel Doubleday (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13065867#comment-13065867
 ] 

Daniel Doubleday commented on CASSANDRA-2868:
-

It's indeed promising. We have been running this in production for 3 days now 
and rss increased only insignificantly by ~5MB a day. 

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor
 Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-15 Thread Zhu Han (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066009#comment-13066009
 ] 

Zhu Han commented on CASSANDRA-2868:


{We have been running this in production for 3 days now and rss increased only 
insignificantly by ~5MB a day}

Do you mean -XX:MaxDirectMemorySize is very helpful to control RSS increasing? 

I have no idea why just some of us meets the problem. I suppose it is a kernel 
bug.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor
 Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-15 Thread Chris Burroughs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066057#comment-13066057
 ] 

Chris Burroughs commented on CASSANDRA-2868:


I interpreted Daniel's this to be the 2868-v1.txt patch (or something 
equivalent) with cassandra.enable_gc_inspector=false.  I did not find 
-XX:MaxDirectMemorySize to be helpful.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor
 Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-14 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13065247#comment-13065247
 ] 

Jonathan Ellis commented on CASSANDRA-2868:
---

Promising!

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor
 Attachments: 2868-v1.txt, low-load-36-hours-initial-results.png


 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-12 Thread Daniel Doubleday (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063770#comment-13063770
 ] 

Daniel Doubleday commented on CASSANDRA-2868:
-

Next: [anon]: 3675224 (+47616KB in 1 day)

bq. Is your data size constant? If not you are probably seeing growth in the 
index samples and bloom filters.

Well no - the data size is increasing. But I thought that index and bf is good 
old plain java heap no? JVM heap stats are really relaxed. Yet I think that 
doesn't really matter because what we are seeing is an ever increasing rss mem 
consumption even though we have -Xmx3G and -Xms3G and mlockall (pmap shows 
these 3G as one block). So something seems to be constantly allocating native 
mem that has nothing to do with java heap.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor

 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-12 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064128#comment-13064128
 ] 

Jonathan Ellis commented on CASSANDRA-2868:
---

We call getLastGcInfo several times a second.  
http://twitter.com/#!/kimchy/status/90861039930970113

You could try turning GCInspector methods into a no-op and see if that makes it 
go away.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor

 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-12 Thread Chris Burroughs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064247#comment-13064247
 ] 

Chris Burroughs commented on CASSANDRA-2868:


http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129 will be the id when 
bugs.sun.com gets around to doing it's thing.

I confirmed that -XX:MaxDirectMemorySize does not protect you from this (ie 
it's a native native leak, not some DirectByteBuffer thing).  I'll be able to 
test this but not until the end of this week at the earliest (and it will then 
take at least another week to be sure).

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor

 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-11 Thread Chris Burroughs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063463#comment-13063463
 ] 

Chris Burroughs commented on CASSANDRA-2868:


At one point I was convinced this was a JVM bug and opened 
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7037080  After seeing how 
totally broken NIO is after CASSANDRA-2654 I'm no longer sure of anything.

I was going to start a survey on the user list after the summit to see if any 
OS/jvm level pattern could be found, since clearly it doesn't happen to 
everyone in all cases.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor

 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2868) Native Memory Leak

2011-07-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13063523#comment-13063523
 ] 

Jonathan Ellis commented on CASSANDRA-2868:
---

Is your data size constant?  If not you are probably seeing growth in the index 
samples and bloom filters.

 Native Memory Leak
 --

 Key: CASSANDRA-2868
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2868
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Daniel Doubleday
Priority: Minor

 We have memory issues with long running servers. These have been confirmed by 
 several users in the user list. That's why I report.
 The memory consumption of the cassandra java process increases steadily until 
 it's killed by the os because of oom (with no swap)
 Our server is started with -Xmx3000M and running for around 23 days.
 pmap -x shows
 Total SST: 1961616 (mem mapped data and index files)
 Anon  RSS: 6499640
 Total RSS: 8478376
 This shows that  3G are 'overallocated'.
 We will use BRAF on one of our less important nodes to check wether it is 
 related to mmap and report back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira