from:"Peter Schuller"

Re: Node forgets about most of its column families

2012-08-28 Thread Peter Schuller

to disablegossip and make other nodes not send requests to it. disabling thrift would also be advised, or even firewalling it prior to restart. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JMX(RMI) dynamic port allocation problem still exists?

2012-08-29 Thread Peter Schuller

I can recommend Jolokia highly for providing an HTTP/JSON interface to JMX (it can be trivially run in agent mode by just altering JVM args): http://www.jolokia.org/ -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Memory Usage of a connection

2012-08-30 Thread Peter Schuller

amounts of data? Large or many columns (or both), etc. Essentially all working data that your request touches is allocated on the heap and contributes to allocation rate and ParNew frequency. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: force gc?

2012-09-02 Thread Peter Schuller

is not as compact as PostgreSQL. For example column names are duplicated in each row, and the row key is duplicated twice (once in index, once in data). -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: force gc?

2012-09-02 Thread Peter Schuller

I think that was clear from your post. I don't see a problem with your process. Setting gc grace to 0 and forcing compaction should indeed return you to the smallest possible on-disk size. (But may be unsafe as documented; can cause deleted data to pop back up, etc.) -- / Peter Schuller

Re: force gc?

2012-09-02 Thread Peter Schuller

over all rows: for row_id, row in your_column_family.get_range(): https://github.com/pycassa/pycassa -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Invalid Counter Shard errors?

2012-09-06 Thread Peter Schuller

This problem is not new to 1.1. On Sep 6, 2012 5:51 AM, Radim Kolar h...@filez.com wrote: i would migrate to 1.0 because 1.1 is highly unstable.

Re: Cassandra 1.1.1 on Java 7

2012-09-08 Thread Peter Schuller

Has anyone tried running 1.1.1 on Java 7? Have been running jdk 1.7 on several clusters on 1.1 for a while now. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-10 Thread Peter Schuller

by inter-region pointers). If you can avoid that, one might hope to avoid full gc:s all-together. The jury is still out on my side; but like I said, I've seen promising indications. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-12 Thread Peter Schuller

question is how often. But given the lack of handling of such failure modes, the effect on clients is huge. Recommend data reads by default to mitigate this and a slew of other sources of problems (and for counter increments, we're rolling out least-active-request routing). -- / Peter Schuller

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-12 Thread Peter Schuller

it is in action. FWIW, J9's balanced collector is very similar to G1 in it's design. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-12 Thread Peter Schuller

Our full gc:s are typically not very frequent. Few days or even weeks in between, depending on cluster. *PER NODE* that is. On a cluster of hundreds of nodes, that's pretty often (and all it takes is a single node). -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Changing bloom filter false positive ratio

2012-09-14 Thread Peter Schuller

sstable will effectively cover almost the entire range (since you're effectively spraying random tokens at it, unless clients are writing data in md5 order). (Maybe it's different for ordered partitioning though.) -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-15 Thread Peter Schuller

the causes of un-predictable behavior w.r.t. GC by being careful about it's memory allocation and *retention* profile. For the specific case of avoiding *ever* seeing a full gc, it gets even more complex. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Invalid Counter Shard errors?

2012-09-20 Thread Peter Schuller

the top of my head) the resulting value being correct is if the later increment (N2 in this case) is somehow including N1 as well (e.g., because it was generated by first reading the current counter value). -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Invalid Counter Shard errors?

2012-09-20 Thread Peter Schuller

be safely retried. Cassandra counters are generally not useful if *strict* correctness is desired, for this reason. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-24 Thread Peter Schuller

with slightly changed workloads? It's very hard to blackbox-test GC settings, which is probably why GC tuning can be perceived as a useless game of whack-a-mole. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Why data tripled in size after repair?

2012-09-26 Thread Peter Schuller

a single sstable bigger than what would normally happen, and it takes more total disk space before it will be part of a compaction again. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Why data tripled in size after repair?

2012-10-01 Thread Peter Schuller

of cassandra? It's in the 1.1 branch; I don't remember if it went into a release yet. If not, it'll be in the next 1.1.x release. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: nodetool cleanup

2012-10-22 Thread Peter Schuller

On Oct 22, 2012 11:54 AM, B. Todd Burruss bto...@gmail.com wrote: does nodetool cleanup perform a major compaction in the process of removing unwanted data? No.

Re: Java 7 support?

2012-10-24 Thread Peter Schuller

FWIW, we're using openjdk7 on most of our clusters. For those where we are still on openjdk6, it's not because of an issue - just haven't gotten to rolling out the upgrade yet. We haven't had any issues that I recall with upgrading the JDK. -- / Peter Schuller (@scode, http

Re: Simulating a failed node

2012-10-28 Thread Peter Schuller

of a different story and if you want to test behavior when nodes go down I suggest including that. See CASSANDRA-2540 and CASSANDRA-3927.) -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Read IO

2013-02-20 Thread Peter Schuller

settings (typically trading pollution of page cache vs. number of I/O:s). -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Multi-DC Deployment

2011-04-21 Thread Peter Schuller

to skip rows that are obviously bad, but true integrity checking is not supported at this time. -- / Peter Schuller

Re: decommissioning a wrong node

2011-04-24 Thread Peter Schuller

. the common case of wanting to listen on 127.0.0.1 but no public interfaces... -- / Peter Schuller

Re: Performance tests using stress testing tool

2011-04-28 Thread Peter Schuller

haywire as you don't service as many I/O requests as are coming in. There is a grey area in between where latency will be very sensitive to smallish changes in I/O load but aggregate throughput remaining below what can be sustained. -- / Peter Schuller

Re: Performance tests using stress testing tool

2011-04-28 Thread Peter Schuller

. -- / Peter Schuller

Re: OOM on heavy write load

2011-04-28 Thread Peter Schuller

usage during periods of timeouts. If the huge allocations fail due to fragmentation and fallback to Full GC that might be an expected result. Else -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps. -- / Peter Schuller

Re: OOM on heavy write load

2011-04-28 Thread Peter Schuller

issues. *Maybe* after two full gc:s tops if the first happens while there's a mix still active in memtables. -- / Peter Schuller

Re: Performance tests using stress testing tool

2011-04-29 Thread Peter Schuller

Thanks Peter. I am using java version of the stress testing tool from the contrib folder. Is there any issue that should be aware of? Do you recommend using pystress? I just saw Brandon file this: https://issues.apache.org/jira/browse/CASSANDRA-2578 Maybe that's it. -- / Peter Schuller

Re: Backup full cluster

2011-05-04 Thread Peter Schuller

the former row key gets restored to a point in time prior to that of the latter row key may cause the latter write to become visible even though the former write is lost. -- / Peter Schuller

Re: Backup full cluster

2011-05-04 Thread Peter Schuller

: Think before typing. :) -- / Peter Schuller

Re: Decommissioning node is causing broken pipe error

2011-05-05 Thread Peter Schuller

that the need for major compactions is significantly lessened or even eliminated. However, running major compactions won't cause tombstones *not* to be removed; it's just not required *in order* for them to be removed. -- / Peter Schuller

Re: compaction strategy

2011-05-07 Thread Peter Schuller

. -- / Peter Schuller

Re: compaction strategy

2011-05-07 Thread Peter Schuller

to the effects of the temporary spike in data size and cache coldness. Sounds like it makes good sense in your situation though. -- / Peter Schuller

Re: strange behaviour in cassandra

2011-05-08 Thread Peter Schuller

lead to a delay of sstable removal which will vary with whatever else is happening (the more busy the node, the more often a concurrent mark/sweep gc phase is triggered, and the more frequently obsolete sstables are deleted). -- / Peter Schuller

Re: Index interval tuning

2011-05-09 Thread Peter Schuller

to judge and likely depends a lot on i/o scheduling and other details. -- / Peter Schuller

Re: Index interval tuning

2011-05-10 Thread Peter Schuller

store? So the only thing I can do is test it and see how it goes. To make the change affective, should I do anything beyond changing the value in cassandra.yaml and restart the node? I'll try first with 256 and see what happens. That should be it. -- / Peter Schuller

Re: Finding big rows

2011-05-11 Thread Peter Schuller

What is the best way to find keys of such big rows? One, if not necessarily the best, way is to check system.log for large row warnings that trigger for rows large enough to be compacted lazily. Grep for 'azy' (or lazy case-insens) and you should find it. -- / Peter Schuller

Re: Commitlog Disk Full

2011-05-12 Thread Peter Schuller

setting in question is the memtable_flush_after setting. Do you have that set to something very high on one of your column families? You can use describe keyspace name_of_keyspace in cassandra-cli to check current settings. -- / Peter Schuller

Re: Commitlog Disk Full

2011-05-13 Thread Peter Schuller

? Including overwrites. If not, I'm not sure what's going on. Since you said it took about a day of traffic it feels fishy. -- / Peter Schuller

Re: Monitoring bytes read per cf

2011-05-13 Thread Peter Schuller

to be, due to the LRU:ishness of caches, the less frequently accessed data that tends to make it difficult to judge by numbers that include all I/O. -- / Peter Schuller

Re: Native heap leaks?

2011-05-15 Thread Peter Schuller

Ok, so I think I found one major cause contributing to the increasing Nice job tracking this down! That is useful to know, even outside of Cassandra use cases. Frankly it's disappointing to learn what nio is doing. -- / Peter Schuller

Re: Inconsistent data issues when running nodetool move.

2011-05-15 Thread Peter Schuller

is useful since it allows consistency semantics similar to ALL but allows you to survive nodes being down, at the cost of a higher RF (3 at least)) -- / Peter Schuller

Re: Cassandra and concurrent programming

2011-05-16 Thread Peter Schuller

/) for that. -- / Peter Schuller

Re: Gossiper question

2011-05-18 Thread Peter Schuller

/browse/CASSANDRA-2554 which may be relevant, but simple overload is also a possible reason. -- / Peter Schuller

Re: repair question

2011-05-23 Thread Peter Schuller

. Particularly in a situation with lots of dropped messages. I'm getting the 2^15 from AntiEntropyService.Validator.Validator() which passes a maxsize of 2^15 to the MerkelTree constructor. -- / Peter Schuller

Re: repair question

2011-05-24 Thread Peter Schuller

the more I think of it ;) -- / Peter Schuller

Re: repair question

2011-05-24 Thread Peter Schuller

Hmmm, I'm starting to like this idea more and more the more I think of it ;) Filed: https://issues.apache.org/jira/browse/CASSANDRA-2699 -- / Peter Schuller

Re: sync commitlog in batch mode lose data

2011-05-31 Thread Peter Schuller

and cassandra service 5). read the key list generated in step 2) with consistency level ONE How sure are you that the system is honoring fsync() properly, including flushing any caches on underlying drives? Or is this with battery backed caching RAID controllers? -- / Peter Schuller

Re: sync commitlog in batch mode lose data

2011-06-03 Thread Peter Schuller

will legitimiately get very low values without it indicating anything is wrong.) -- / Peter Schuller

Re: sync commitlog in batch mode lose data

2011-06-07 Thread Peter Schuller

kernels) barriers at the OS level - and the list goes on. -- / Peter Schuller

Re: Retrieving a column from a fat row vs retrieving a single row

2011-06-08 Thread Peter Schuller

adding a lot of overhead in terms of disk I/O unless your data set fits comfortably in memory. -- / Peter Schuller

Re: repair and amount of transfers

2011-06-14 Thread Peter Schuller

at the same time. -- / Peter Schuller

Re: Forcing Cassandra to free up some space

2011-06-15 Thread Peter Schuller

changes so I'm not sure off hand what'll happen to the auto-system.gc code in cassandra that attempts to free space. CASSANDRA-2521 is IMO the real solution. -- / Peter Schuller

Re: Force a node to form part of quorum

2011-06-16 Thread Peter Schuller

It would be great if Cassandra puts this on their roadmap. There is lot of durability benefits by incorporating dc awareness into the write consistency equation. You may be interested in the discussion here: https://issues.apache.org/jira/browse/CASSANDRA-2338 -- / Peter Schuller

Re: Pruning commit logs manually

2011-06-17 Thread Peter Schuller

a memtable is flushed can the commit log which contains the data being flushed, be removed by Cassandra. -- / Peter Schuller

Re: CommitLog replay

2011-06-21 Thread Peter Schuller

when it starts up the thrift interface - check system.log. -- / Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-23 Thread Peter Schuller

benefits from this. The only hard requirement is the repair schedule relative to GC grace time, and that requirement does not change - just be mindful of the timing of the EBS snapshots and what that means to your repair schedule. -- / Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-23 Thread Peter Schuller

EBS volume atomicity is good. We've had tons of experience since EBS came out almost 4 years ago, to back all kinds of things, including large DBs. And thanks a lot for coming forward with production experience. That is always useful with these things. -- / Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-23 Thread Peter Schuller

about freeze maybe being probabilistically useful anyway. -- / Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-23 Thread Peter Schuller

this in detail. I should really start working off the backlog of those blog entries... -- / Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-23 Thread Peter Schuller

certain goals, including crash consistency. It still relies on the same fundamental properties of the underlying storage device. -- / Peter Schuller

Re: Cassandra ACID

2011-06-24 Thread Peter Schuller

choosing batch commit log sync instead of periodic if single-node durability or post-quorum-write durability is a concern. -- / Peter Schuller

Re: AntiEntropy?

2011-07-11 Thread Peter Schuller

-- / Peter Schuller

Re: Node repair questions

2011-07-11 Thread Peter Schuller

/networking load and is not yet rate limited like compaction. In addition be aware that repair can cause disk space usage to temporarily increase if there are significant differences to be repaired. -- / Peter Schuller

Re: Node repair questions

2011-07-11 Thread Peter Schuller

to be. -- / Peter Schuller

Re: Re: AntiEntropy?

2011-07-12 Thread Peter Schuller

must be scheduled by the operator to run regularly. The name repair is a bit unfortunate; it is not meant to imply that it only needs to run when something is wrong. -- / Peter Schuller

Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-12 Thread Peter Schuller

new sstables) which is expensive for the usual reasons with disk I/O; it's major since it covers all data. The data read is in fact used to calculate a merkle tree for comparison with neighbors, as claimed. -- / Peter Schuller

Re: Anyone using Facebook's flashcache?

2011-07-12 Thread Peter Schuller

and your memory is enough to keep the hot set, and you're disk I/O is coming form the long tail, increasing the amount of cache to 200 gig may not necessarily give you a huge improvement in terms of percentages. -- / Peter Schuller

Re: Anyone using Facebook's flashcache?

2011-07-12 Thread Peter Schuller

be 10 times that of the original cache. I did a quick Google but didn't find a good piece describing it more properly, but hopefully the above is helpful. Some related reading might be http://en.wikipedia.org/wiki/Long_Tail -- / Peter Schuller

Re: Re: Re: AntiEntropy?

2011-07-12 Thread Peter Schuller

documented in the link I sent before, unless you have specific reasons not to and know what you're doing. -- / Peter Schuller

Re: Re: Re: Re: AntiEntropy?

2011-07-13 Thread Peter Schuller

repair take? So basically, leave significant margin. -- / Peter Schuller

Re: commitlog replay missing data

2011-07-13 Thread Peter Schuller

dependent on some marker that isn't written until commit log synch.) -- / Peter Schuller (@scode on twitter)

Re: commitlog replay missing data

2011-07-13 Thread Peter Schuller

# wait for a bit until no one is sending it writes anymore More accurately, until all other nodes have realized it's down (nodetool ring on each respective host). -- / Peter Schuller (@scode on twitter)

Re: Replicating to all nodes

2011-07-13 Thread Peter Schuller

? -- / Peter Schuller (@scode on twitter)

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller

to increase RF. If you *really* know what you're doing and why you want RF to track total node count, I'm sure there are *some* cases where this makes sense. But nothing you've said so far really indicates you're in such a position. -- / Peter Schuller (@scode on twitter)

Re: JNA to avoid swap but physical memory increase

2011-07-15 Thread Peter Schuller

really have permission to do the mlockall()? (Not that I disagree in any way that swap should be disabled, +1 on that.) -- / Peter Schuller (@scode on twitter)

Re: Cache layer in front of cassandra... any help / suggestions?

2011-07-15 Thread Peter Schuller

. -- / Peter Schuller (@scode on twitter)

Re: Cache layer in front of cassandra... any help / suggestions?

2011-07-15 Thread Peter Schuller

checking out: https://issues.apache.org/jira/browse/CASSANDRA-1283 https://issues.apache.org/jira/browse/CASSANDRA-1969 If it can be done via that the nice thing is that you don't loose consistency. -- / Peter Schuller (@scode on twitter)

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller

of consistency level *never* affects *which* nodes are responsible for a given row key, nor does it affect which rows will eventually receive writes. It *only* affects how many nodes must respond before the operation (read or write) is considered successful. Does that make it clearer? -- / Peter

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller

will be serving the requests. -- / Peter Schuller (@scode on twitter)

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller

. It is not the case that having RF be equal to the cluster size is in and of itself a useful property. -- / Peter Schuller (@scode on twitter)

Re: How are column sort handled?

2011-07-18 Thread Peter Schuller

decrease read performance, as the average row can become more spread out over multiple sstables. This is one potential driver for compaction. -- / Peter Schuller (@scode on twitter)

Re: How are column sort handled?

2011-07-18 Thread Peter Schuller

Thanks! Then does it mean that before compaction if read call comes for that key sort is done at the read time since column b, c and a are in different ssTables. Essentially yes; a merge-sort happens (since they are sorted locally in each sstable). -- / Peter Schuller (@scode on twitter)

Re: host clocks

2011-07-25 Thread Peter Schuller

with respect to.) Clocks should be synchronized, yes. But either your data model is such that conflicting writes are okay, or you need external co-ordination. There's not hoping for the best by keeping clocks better in synch. -- / Peter Schuller (@scode on twitter)

Re: Predictable low RW latency, SLABS and STW GC

2011-07-26 Thread Peter Schuller

effects on caches like streaming through all the bf and index files, restarts are certainly detrimental to the page cache. Also you may still see some eviction (even if it doesn't *necessarily* happen) depending (particularly if not running with numactl set to interleave). -- / Peter Schuller

Re: Too many open files

2011-07-27 Thread Peter Schuller

nodes that run into this. -- / Peter Schuller (@scode on twitter)

Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Peter Schuller

construct a benchmark where there's no difference, yet see a significant difference in a real-world scenario when your benchmarked I/O is intermixed with other I/O. Not to mention subtle differences in behaviors of kernels, RAID controllers, disk drive controllers, etc... -- / Peter Schuller (@scode

Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-29 Thread Peter Schuller

that cost is not specific to mmap():ed files (once a given page is in core that is). (But again, I'm not arguing the point in Cassandra's case; just generally.) -- / Peter Schuller (@scode on twitter)

Re: Question about eventually consistent

2011-07-31 Thread Peter Schuller

-ordination outside of Cassandra, there will be the potential for multiple clients reading and writing without awareness of each other. Whatever the behavior is, your data model must be such that this is acceptable. -- / Peter Schuller (@scode on twitter)

Re: Unable to repair a node

2011-08-14 Thread Peter Schuller

that ReadStage is usually full (@ your limit)? -- / Peter Schuller (@scode on twitter)

Re: Unable to repair a node

2011-08-14 Thread Peter Schuller

with RF=3, then for whatever ranges of the ring that node was partially responsible for, only 2 of the 3 copies will be up/available. They do not automatically migrate somewhere else. -- / Peter Schuller (@scode on twitter)

Re: Nodetool repair takes 4+ hours for about 10G data

2011-08-19 Thread Peter Schuller

The compactions ettings do not affect repair. (Thinking out loud, or does it ? Validation compactions and table builds.) It does. -- / Peter Schuller (@scode on twitter)

Re: Nodetool repair takes 4+ hours for about 10G data

2011-08-19 Thread Peter Schuller

filters and indexes. Can be CPU or I/O bound (or throttled) - nodetool compactionstats, htop, iostat -x -k 1 -- / Peter Schuller (@scode on twitter)

Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process

2011-08-19 Thread Peter Schuller

pasted; periodic repairs are part of regular cluster maintenance. See: http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair -- / Peter Schuller (@scode on twitter)

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Peter Schuller

netstats. If there are stuck streams, they might be causing sstable retention beyond what you'd expect. -- / Peter Schuller (@scode on twitter)

Re: How can I patch a single issue

2011-08-19 Thread Peter Schuller

rebasing is necessary. You might try a trunk from further back in time (around the time Stu submitted the patch). I'm not quite sure what you're actual problem is though, if it's source code access then the easiest route is probably to check it out from https://github.com/apache/cassandra -- / Peter

Re: Occasionally getting old data back with ConsistencyLevel.ALL

2011-08-19 Thread Peter Schuller

) that indicates the items you specifically say are processed twice, were in fact written twice to Cassandra? -- / Peter Schuller (@scode on twitter)

Re: Unable to repair a node

2011-08-19 Thread Peter Schuller

is if there is a very small amount of data (yet non-zero), or significant amounts. -- / Peter Schuller (@scode on twitter)

1 2 3 4 5 6 >

1 - 100 of 552 matches

Mail list logo