Re: OOM while reading key cache
Yes, as I wrote in first e-mail. When I removed key cache file cassandra started without further problems. regards Olek 2013/11/13 Robert Coli rc...@eventbrite.com: On Wed, Nov 13, 2013 at 12:35 AM, Tom van den Berge t...@drillster.com wrote: I'm having the same problem, after upgrading from 1.2.3 to 1.2.10. I can remember this was a bug that was solved in the 1.0 or 1.1 version some time ago, but apparently it got back. A workaround is to delete the contents of the saved_caches directory before starting up. Yours is not the first report of this I've heard resulting from a 1.2.x to 1.2.x upgrade. Reports are of the form I had to nuke my saved_caches or I couldn't start my node, it OOMED, etc.. https://issues.apache.org/jira/browse/CASSANDRA-6325 Exists, but doesn't seem to be the same issue. https://issues.apache.org/jira/browse/CASSANDRA-5986 Similar, doesn't seem to be an issue triggered by upgrade.. If I were one of the posters on this thread, I would strongly consider filing a JIRA on point. @OP (olek) : did removing the saved_caches also fix your problem? =Rob
Cassand is holding too many deleted file descriptors
I See lots of these deleted file descriptors cassandra is holding in my case out of 90K file descriptors 80.5K is having these descriptors Because of this cassandra is not performing well. Can some one please tell what i am doing wrong. lr-x-- 1 root root 64 Nov 14 08:25 10875 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10876 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10877 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-133-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10878 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-124-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10879 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:11 1088 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10880 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-133-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10881 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10882 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-124-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10883 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10884 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-133-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10885 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10886 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-124-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10887 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted)
Re: OOM while reading key cache
A few month ago, we've got a similar issue on 1.2.6 : https://issues.apache.org/jira/browse/CASSANDRA-5706 But it has been fixed and did not encountered this issue anymore (we're also on 1.2.10) 2013/11/14 olek.stas...@gmail.com olek.stas...@gmail.com Yes, as I wrote in first e-mail. When I removed key cache file cassandra started without further problems. regards Olek 2013/11/13 Robert Coli rc...@eventbrite.com: On Wed, Nov 13, 2013 at 12:35 AM, Tom van den Berge t...@drillster.com wrote: I'm having the same problem, after upgrading from 1.2.3 to 1.2.10. I can remember this was a bug that was solved in the 1.0 or 1.1 version some time ago, but apparently it got back. A workaround is to delete the contents of the saved_caches directory before starting up. Yours is not the first report of this I've heard resulting from a 1.2.x to 1.2.x upgrade. Reports are of the form I had to nuke my saved_caches or I couldn't start my node, it OOMED, etc.. https://issues.apache.org/jira/browse/CASSANDRA-6325 Exists, but doesn't seem to be the same issue. https://issues.apache.org/jira/browse/CASSANDRA-5986 Similar, doesn't seem to be an issue triggered by upgrade.. If I were one of the posters on this thread, I would strongly consider filing a JIRA on point. @OP (olek) : did removing the saved_caches also fix your problem? =Rob -- Fabien Rousseau aur...@yakaz.comwww.yakaz.com
Re: Cassand is holding too many deleted file descriptors
yeah this is known, and we are looking for a fix https://issues.apache.org/jira/browse/CASSANDRA-6275 if you have a simple way of reproducing, please add a comment On Thu, Nov 14, 2013 at 10:53 AM, Murthy Chelankuri kmurt...@gmail.comwrote: I See lots of these deleted file descriptors cassandra is holding in my case out of 90K file descriptors 80.5K is having these descriptors Because of this cassandra is not performing well. Can some one please tell what i am doing wrong. lr-x-- 1 root root 64 Nov 14 08:25 10875 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10876 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10877 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-133-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10878 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-124-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10879 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:11 1088 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10880 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-133-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10881 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10882 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-124-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10883 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10884 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-133-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10885 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-110-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10886 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-124-Data.db (deleted) lr-x-- 1 root root 64 Nov 14 08:25 10887 - /var/lib/cassandra/data/tests/sample_data/sample_data_points-jb-119-Data.db (deleted)
Re: Modeling multi-tenanted Cassandra schema
OK, so in the end I elected to go for option (c), which makes my table definition look like this: create table tenanted_foo_table ( tenant ascii, application_key bigint, timestamp timestamp, other non-key columns PRIMARY KEY ((tenant, application_key), timestamp) ) such that on disk the row keys are effectively tenant:application_key concatenations. Thanks for your input, Ben On Wed, Nov 13, 2013 at 2:43 PM, Nate McCall n...@thelastpickle.com wrote: Astyanax and/or the DS Java client depending on your use case. (Emphasis on the and - really no reason you can't use both - even on the same schema - depending on what you are doing as they both have their strengths and weaknesses). To be clear, Hector is not going away. We are still accepting patches and updates, but there is no active feature development. Any other hector specific questions, please start a thread over on hector-us...@googlegroups.com On Wed, Nov 13, 2013 at 8:35 AM, Shahab Yunus shahab.yu...@gmail.com wrote: Nate, (slightly OT), what client API/library is recommended now that Hector is sunsetting? Thanks. Regards, Shahab On Wed, Nov 13, 2013 at 9:28 AM, Nate McCall n...@thelastpickle.com wrote: You basically want option (c). Option (d) might work, but you would be bending the paradigm a bit, IMO. Certainly do not use dedicated column families or keyspaces per tennant. That never works. The list history will show that with a few google searches and we've seen it fail badly with several clients. Overall, option (c) would be difficult to do in CQL without some very well thought out abstractions and/or a deep hack on the Java driver (not in-ellegant or impossible, just lots of moving parts to get your head around if you are new to such). That said, depending on the size of your project and skill of your team, this direction might be worth considering. Usergrid (just accepted for incubation at Apache) functions this way via the Thrift API: https://github.com/apigee/usergrid-stack The commercial version of Usergrid has tens of thousands of active tennants on a single cluster (same code base at the service layer as the open source version). It uses Hector's built in virtual keyspaces: https://github.com/hector-client/hector/wiki/Virtual-Keyspaces (NOTE: though Hector is sunsetting/in patch maintenance, the approach is certainly legitimate - but I'd recommend you *not* start a new project on Hector). In short, Usergrid is the only project I know of that has a well-proven tenant model that functions at scale, though I'm sure there are others around, just not open sourced or actually running large deployments. Astyanax can do this as well albeit with a little more work required: https://github.com/Netflix/astyanax/wiki/Composite-columns#how-to-use-the-prefixedserializer-but-you-really-should-use-composite-columns Happy to clarify any of the above. On Tue, Nov 12, 2013 at 3:19 AM, Ben Hood 0x6e6...@gmail.com wrote: Hi, I've just received a requirement to make a Cassandra app multi-tenanted, where we'll have up to 100 tenants. Most of the tables are timestamped wide row tables with a natural application key for the partitioning key and a timestamp key as a cluster key. So I was considering the options: (a) Add a tenant column to each table and stick a secondary index on that column; (b) Add a tenant column to each table and maintain index tables that use the tenant id as a partitioning key; (c) Decompose the partitioning key of each table and add the tenant and the leading component of the key; (d) Add the tenant as a separate clustering key; (e) Replicate the schema in separate tenant specific key spaces; (f) Something I may have missed; Option (a) seems the easiest, but I'm wary of just adding secondary indexes without thinking about it. Option (b) seems to have the least impact of the layout of the storage, but a cost of maintaining each index table, both code wise and in terms of performance. Option (c) seems quite straight forward, but I feel it might have a significant effect on the distribution of the rows, if the cardinality of the tenants is low. Option (d) seems simple enough, but it would mean that you couldn't query for a range of tenants without supplying a range of natural application keys, through which you would need to iterate (under the assumption that you don't use an ordered partitioner). Option (e) appears relatively straight forward, but it does mean that the application CQL client needs to maintain separate cluster connections for each tenant. Also I'm not sure to what extent key spaces were designed to partition identically structured data. Does anybody have any experience with running a multi-tenanted Cassandra app, or does this just depend too much on the specifics of the application? Cheers, Ben -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical
Hints still exist for a removed node
Hi, I saw on http://www.datastax.com/dev/blog/modern-hinted-handoff (wrote on december 2012) that hints targeting a removed node (our case) are automatically removed. However, a compaction has been done on our cf and hints for the removed node are still stored. We're using version 1.2.2 (February 2013). Do you mean by automatically that they will be removed after a period of time but not after a compaction ? I see a TTL of 10 days added to each row in the hints data file. Another question, is about Finished hinted handoff of 0 rows to endpoint info messages. CASSANDRA-5068 patch included in our version is supposed to fix a bad behaviour which was the cause of similar messages. We don't have hints stored for the endpoints concerned by these messages, but they appear in our log files. I don't know if it's related but I have a compaction of hints at the same time : http://pastebin.com/71nw2Uqh . Can Anyone explain us what's happening if it's an expected behaviour. thanks -- Cyril SCETBON
Risk of not doing repair
Hello, I'm facing bug https://issues.apache.org/jira/browse/CASSANDRA-6277. After migration to 2.0.2 I can't perform repair on my cluster (six nodes). Repair on the biggest CF breaks with error described in Jira. I know, that probably there is a solution in repository, but it's not included in any release. I can estimate, that 2.0.3 with this fix will be released in december. If it's not really neccessary, i would avoid building unstable version of cass from sources and install it in prod environ, I would rather use rpm-based distribution to keep system in consistent state. So this is my question: What is the risk for me concerned with not doing repair for a month, assuming that gc_grace is 10days? Should I really worry? Maybe I should use repo version of cass? best regards Olek
Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled
First off, I'm curious what hardware (system specs) you're running this on? Secondly, here are some observations: * You're not running the newest JDK7, I can tell by your stack-size. Consider getting the newest. * Cassandra 2.0.2 has a lot of improvements, consider upgrading. We noticed improved heap usage compared to 2.0.2 * Have you simply tried decreasing the size of your row cache? Tried 256MB? * Do you have JNA installed? Otherwise, you're not getting off-heap usage for these caches which seems likely. Check your cassandra.log to verify JNA operation. * Your NewGen is too small. See your heap peaks? This is because short-lived memory is being put into OldGen, which only gets cleaned up during fullGC. You should set your NewGen to about 25-30% of your total heapsize. Many objects are short-lived, and CMS GC is significantly more efficient if the shorter-lived objects never get promoted to OldGen; you'll get more concurrent, non-blocking GC. If you're not using JNA (per above) row-cache and key-cache is still on-heap, so you want your NewGen to be = twice as large as the size of these combined caches. You should never so those crazy heap spikes, your caches are essentially overflowing into OldGen (with JNA). On Tue, Nov 5, 2013 at 3:04 AM, Jiri Horky ho...@avast.com wrote: Hi there, we are seeing extensive memory allocation leading to quite long and frequent GC pauses when using row cache. This is on cassandra 2.0.0 cluster with JNA 4.0 library with following settings: key_cache_size_in_mb: 300 key_cache_save_period: 14400 row_cache_size_in_mb: 1024 row_cache_save_period: 14400 commitlog_sync: periodic commitlog_sync_period_in_ms: 1 commitlog_segment_size_in_mb: 32 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data2/cassandra-work/instance-1/cassandra-1383566283-pid1893.hprof -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark We have disabled row cache on one node to see the difference. Please see attached plots from visual VM, I think that the effect is quite visible. I have also taken 10x jmap -histo after 5s on a affected server and plotted the result, attached as well. I have taken a dump of the application when the heap size was 10GB, most of the memory was unreachable, which was expected. The majority was used by 55-59M objects of HeapByteBuffer, byte[] and org.apache.cassandra.db.Column classes. I also include a list of inbound references to the HeapByteBuffer objects from which it should be visible where they are being allocated. This was acquired using Eclipse MAT. Here is the comparison of GC times when row cache enabled and disabled: prg01 - row cache enabled - uptime 20h45m - ConcurrentMarkSweep - 11494686ms - ParNew - 14690885 ms - time spent in GC: 35% prg02 - row cache disabled - uptime 23h45m - ConcurrentMarkSweep - 251ms - ParNew - 230791 ms - time spent in GC: 0.27% I would be grateful for any hints. Please let me know if you need any further information. For now, we are going to disable the row cache. Regards Jiri Horky
db file missing error
Hi all, When I run nodetool repair, I'm getting an error that indicates that several of the Data.db files are missing. Is there a way to correct this error ? The files that the error message is referencing are indeed missing, I'm not sure why it is looking for them to begin with. AFAIK nothing has been deleted, but there are several apps that run against Cass. Caused by: java.io.FileNotFoundException: /raid0/cassandra/data/OTester/OTester_one/OTester-OTester_one-ic-46-Data.db (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:216) at org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:67) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.init(CompressedRandomAccessReader.java:75) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:42) ... 20 more Thanks, Jim
Re: db file missing error
Found it, had a second repair running which was generating the error. Jim From: Langston, Jim jim.langs...@compuware.commailto:jim.langs...@compuware.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Thu, 14 Nov 2013 18:34:19 + To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: db file missing error Hi all, When I run nodetool repair, I'm getting an error that indicates that several of the Data.db files are missing. Is there a way to correct this error ? The files that the error message is referencing are indeed missing, I'm not sure why it is looking for them to begin with. AFAIK nothing has been deleted, but there are several apps that run against Cass. Caused by: java.io.FileNotFoundException: /raid0/cassandra/data/OTester/OTester_one/OTester-OTester_one-ic-46-Data.db (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:216) at org.apache.cassandra.io.util.RandomAccessReader.init(RandomAccessReader.java:67) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.init(CompressedRandomAccessReader.java:75) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:42) ... 20 more Thanks, Jim
Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled
On Thu, Nov 14, 2013 at 10:05 AM, J. Ryan Earl o...@jryanearl.us wrote: * Cassandra 2.0.2 has a lot of improvements, consider upgrading. We noticed improved heap usage compared to 2.0.2 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ And especially if you're using Level Compaction / LCS : https://issues.apache.org/jira/browse/CASSANDRA-6284 Wrong tracking of minLevel in Leveled Compaction Strategy causing serious performance problems tl;dr - don't upgrade to 2.0.2 in production. =Rob
Re: Risk of not doing repair
On Thu, Nov 14, 2013 at 6:25 AM, olek.stas...@gmail.com olek.stas...@gmail.com wrote: After migration to 2.0.2 I can't perform repair on my cluster (six nodes). ... If it's not really neccessary, i would avoid building unstable version of cass from sources and install it in prod environ You've already installed an unstable version of cassandra in prod, moving up to an unreleased version is unlikely to make things that much less stable. So this is my question: What is the risk for me concerned with not doing repair for a month, assuming that gc_grace is 10days? Should I really worry? Maybe I should use repo version of cass? Do you do delete or CQL3-delete like operations? If so, you have a risk of exposure to zombie data. You should probably increase your gc_grace_seconds to 34 days anyway, so why not use this experience as an opportunity to do so? https://issues.apache.org/jira/browse/CASSANDRA-5850 =Rob
Re: Hints still exist for a removed node
On Thu, Nov 14, 2013 at 6:08 AM, Cyril Scetbon cyril.scet...@free.frwrote: I saw on http://www.datastax.com/dev/blog/modern-hinted-handoff (wrote on december 2012) that hints targeting a removed node (our case) are automatically removed. However, a compaction has been done on our cf and hints for the removed node are still stored. We're using version 1.2.2 (February 2013). Do you mean by automatically that they will be removed after a period of time but not after a compaction ? I see a TTL of 10 days added to each row in the hints data file. gc_grace_seconds We're using version 1.2.2 (February 2013). 1.2.2 contains serious bugs, upgrade ASAP. Finished hinted handoff of 0 rows to endpoint Doesn't have any meaningful impact, is probably fixed upstream. =Rob
Read inconsistency after backup and restore to different cluster
Hi All, After running through our backup and restore process FROM our test production TO our staging environment, we are seeing inconsistent reads from the cluster we restored to. We have the same number of nodes in both clusters. For example, we will select data from a column family on the newly restored cluster but sometimes the expected data is returned and other times it is not. These selects are carried out one after another with very little delay. It is almost as if the data only exists on some of the nodes, or perhaps the token ranges are dramatically different --again, we are using vnodes so I am not exactly sure how this plays into the equation. We are running Cassadra 2.0.2 with vnodes and deploying via chef. The backup and restore process is currently orchestrated using bash scripts and chef's distributed SSH. I have outlined the process below for review. (I) Backup cluster-A (with existing prod data): 1. Run nodetool flush on each of the nodes in a 5 node ring. 2. Run nodetool snapshot keyspace_name on each of the nodes in a 5 node ring. 3. Archive the snapshot data from the snapshots directory in each node, creating a single archive of the snapshot. 4. Copy the snapshot data archive for each of the nodes to s3. (II) Restore backup FROM cluster-A TO cluster-B: *NOTE: cluster-B is a freshly deployed ring with no data, but a different cluster-name used for staging. 1. Deploy 5 nodes as part of the cluster-B ring. 2. Create keyspace_name keyspace and column families on cluster-B. 3. Stop Cassandra on all 5 nodes in the cluster-B ring. 4. Clear commit logs on cluster-B with: rm -f /var/lib/cassandra/commitlog/* 5. Copy 1 of the 5 snapshot archives from cluster-A to each of the five nodes in the new cluster-B ring. 6. Extract the archives to /var/lib/cassandra/data/keyspace_name ensuring that the column family directories and associated .DB files are in place under /var/lib/cassandra/data/keyspace_name/columfamily1/ ….etc. 7.Start Cassandra on each of the nodes in cluster-B. 8. Run nodetool repair on each of the nodes in cluster-B. Please let me know if you see any major errors or deviation from best practices which could be contributing to our read inconsistencies. I'll be happy to answer any specific question you may have regarding our configuration. Thank you in advance! Best regards, -David Laube
Re: Read inconsistency after backup and restore to different cluster
On Thu, Nov 14, 2013 at 12:37 PM, David Laube d...@stormpath.com wrote: It is almost as if the data only exists on some of the nodes, or perhaps the token ranges are dramatically different --again, we are using vnodes so I am not exactly sure how this plays into the equation. The token ranges are dramatically different, due to vnode random token selection from not setting initial_token, and setting num_tokens. You can verify this by listing the tokens per physical node in nodetool gossipinfo or (iirc) nodetool status. 5. Copy 1 of the 5 snapshot archives from cluster-A to each of the five nodes in the new cluster-B ring. I don't understand this at all, do you mean that you are using one source node's data to load each of of the target nodes? Or are you just saying there's a 1:1 relationship between source snapshots and target nodes to load into? Unless you have RF=N, using one source for 5 target nodes won't work. To do what I think you're attempting to do, you have basically two options. 1) don't use vnodes and do a 1:1 copy of snapshots 2) use vnodes and a) get a list of tokens per node from the source cluster b) put a comma delimited list of these in initial_token in cassandra.yaml on target nodes c) probably have to un-set num_tokens (this part is unclear to me, you will have to test..) d) set auto_bootstrap:false in cassandra.yaml e) start target nodes, they will not-bootstrap into the same ranges as the source cluster f) load schema / copy data into datadir (being careful of https://issues.apache.org/jira/browse/CASSANDRA-6245) g) restart node or use nodetool refresh (I'd probably restart the node to avoid the bulk rename that refresh does) to pick up sstables h) remove auto_bootstrap:false from cassandra.yaml I *believe* this *should* work, but have never tried it as I do not currently run with vnodes. It should work because it basically makes implicit vnode tokens explicit in the conf file. If it *does* work, I'd greatly appreciate you sharing details of your experience with the list. General reference on tasks of this nature (does not consider vnodes, but treat vnodes as just a lot of physical nodes and it is mostly relevant) : http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra =Rob
Re: Read inconsistency after backup and restore to different cluster
Thank you for the detailed reply Rob! I have replied to your comments in-line below; On Nov 14, 2013, at 1:15 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Nov 14, 2013 at 12:37 PM, David Laube d...@stormpath.com wrote: It is almost as if the data only exists on some of the nodes, or perhaps the token ranges are dramatically different --again, we are using vnodes so I am not exactly sure how this plays into the equation. The token ranges are dramatically different, due to vnode random token selection from not setting initial_token, and setting num_tokens. You can verify this by listing the tokens per physical node in nodetool gossipinfo or (iirc) nodetool status. 5. Copy 1 of the 5 snapshot archives from cluster-A to each of the five nodes in the new cluster-B ring. I don't understand this at all, do you mean that you are using one source node's data to load each of of the target nodes? Or are you just saying there's a 1:1 relationship between source snapshots and target nodes to load into? Unless you have RF=N, using one source for 5 target nodes won't work. We have configured RF=3 for the keyspace in question. Also, from a client perspective, we read with CL=1 and write with CL=QUORUM. Since we have 5 nodes total in cluster-A, we snapshot keyspace_name on each of the five nodes which results in a snapshot directory on each of the five nodes that we archive and ship off to s3. We then take the snapshot archive generated FROM cluster-A_node1 and copy/extract/restore TO cluster-B_node1, then we take the snapshot archive FROM cluster-A_node2 and copy/extract/restore TO cluster-B_node2 and so on and so forth. To do what I think you're attempting to do, you have basically two options. 1) don't use vnodes and do a 1:1 copy of snapshots 2) use vnodes and a) get a list of tokens per node from the source cluster b) put a comma delimited list of these in initial_token in cassandra.yaml on target nodes c) probably have to un-set num_tokens (this part is unclear to me, you will have to test..) d) set auto_bootstrap:false in cassandra.yaml e) start target nodes, they will not-bootstrap into the same ranges as the source cluster f) load schema / copy data into datadir (being careful of https://issues.apache.org/jira/browse/CASSANDRA-6245) g) restart node or use nodetool refresh (I'd probably restart the node to avoid the bulk rename that refresh does) to pick up sstables h) remove auto_bootstrap:false from cassandra.yaml I *believe* this *should* work, but have never tried it as I do not currently run with vnodes. It should work because it basically makes implicit vnode tokens explicit in the conf file. If it *does* work, I'd greatly appreciate you sharing details of your experience with the list. I'll start with parsing out the token ranges that our vnode config ends up assigning in cluster-A, and doing some creative config work on the target cluster-B we are trying to restore to as you have suggested. Depending on what additional comments/recommendation you or another member of the list may have (if any) based on the clarification I've made above, I will absolutely report back my findings here. General reference on tasks of this nature (does not consider vnodes, but treat vnodes as just a lot of physical nodes and it is mostly relevant) : http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra =Rob
making sense of output from Eclipse Memory Analyzer tool taken from .hprof file
I am investigating Java Out of memory heap errors. So I created an .hprof file and loaded it into Eclipse Memory Analyzer Tool which gave some Problem Suspects. First one looks like: One instance of org.apache.cassandra.db.ColumnFamilyStore loaded by sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8 occupies 984,094,664 (11.64%) bytes. The memory is accumulated in one instance of org.apache.cassandra.db.DataTracker$View loaded by sun.misc.Launcher$AppClassLoader @ 0x613e1bdc8. If I click around into the verbiage, I believe I can pick out the name of a column family but that is about it. Can someone explain what the above means in more detail and if it is indicative of a problem? Next one looks like: - •java.lang.Thread @ 0x73e1f74c8 CompactionExecutor:158 - 839,225,000 (9.92%) bytes. •java.lang.Thread @ 0x717f08178 MutationStage:31 - 809,909,192 (9.58%) bytes. •java.lang.Thread @ 0x717f082c8 MutationStage:5 - 649,667,472 (7.68%) bytes. •java.lang.Thread @ 0x717f083a8 MutationStage:21 - 498,081,544 (5.89%) bytes. •java.lang.Thread @ 0x71b357e70 MutationStage:11 - 444,931,288 (5.26%) bytes. -- If I click into the verbiage, they above Compaction and Mutations all seem to be referencing the same column family. Are the above related? Is there a way I can tell more exactly what is being compacted and/or mutated more specifically than which column family?