Fwd: Cassandra 1.0 hangs during GC
Sun Java 6 didn't help it at all Sar shows no special activity on the long GC times And I have remembered that I do a lot of get_range_slices requests, each time on a range of 100, if this could be important. Still my GC times are growing incredible INFO [ScheduledTasks:1] 2012-07-25 12:17:04,029 GCInspector.java (line 122) GC for ParNew: 497253 ms for 1 collections, 114036128 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-25 13:26:21,282 GCInspector.java (line 122) GC for ParNew: 149508 ms for 1 collections, 87015440 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-25 18:39:44,654 GCInspector.java (line 122) GC for ParNew: 83584 ms for 1 collections, 54368032 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-25 22:45:42,448 GCInspector.java (line 122) GC for ParNew: 48454 ms for 1 collections, 44072488 used; max is 8464105472 24.07.2012, 13:07, Joost van de Wijgerd jwijg...@gmail.com: You are better off using Sun Java 6 to run Cassandra. In the past there were issues reported on 7. Can you try running it on Sun Java 6? kind regards Joost On Tue, Jul 24, 2012 at 10:04 AM, Nikolay Kоvshov nkovs...@yandex.ru wrote: 48 G of Ram on that machine, swap is not used. I will disable swap at all just in case I have 4 cassandra processes (parts of 4 different clusters), each allocated 8 GB and using 4 of them java -version java version 1.7.0 Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) 23.07.2012, 20:12, Joost van de Wijgerd jwijg...@gmail.com: Howmuch memory do you have on the machine. Seems like you have 8G reserved for the Cassandra java process, If this is all the memory on the machine you might be swapping. Also which jvm do you use? kind regards Joost On Mon, Jul 23, 2012 at 10:07 AM, Nikolay Kоvshov nkovs...@yandex.ru wrote: 21th I have mirgated to cassandra 1.1.2 but see no improvement cat /var/log/cassandra/Earth1.log | grep GC for INFO [ScheduledTasks:1] 2012-05-22 17:42:48,445 GCInspector.java (line 123) GC for ParNew: 345 ms for 1 collections, 82451888 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-05-23 02:47:13,911 GCInspector.java (line 123) GC for ParNew: 312 ms for 1 collections, 110617416 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-05-23 11:57:54,317 GCInspector.java (line 123) GC for ParNew: 298 ms for 1 collections, 98161920 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-02 08:52:37,019 GCInspector.java (line 123) GC for ParNew: 196886 ms for 1 collections, 2310058496 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-16 17:41:25,940 GCInspector.java (line 123) GC for ParNew: 200146 ms for 1 collections, 2345987088 used; max is 8464105472 === Migrated from 1.0.0 to 1.1.2 INFO [ScheduledTasks:1] 2012-07-21 09:05:08,280 GCInspector.java (line 122) GC for ParNew: 282 ms for 1 collections, 466406864 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-21 12:38:43,132 GCInspector.java (line 122) GC for ParNew: 233 ms for 1 collections, 405269504 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-22 02:29:09,596 GCInspector.java (line 122) GC for ParNew: 253 ms for 1 collections, 389700768 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-22 17:45:46,357 GCInspector.java (line 122) GC for ParNew: 57391 ms for 1 collections, 400083984 used; max is 8464105472 Memory and yaml memory-related settings are default I do not do deletes I have 2 CF's and no secondary indexes LiveRatio's: INFO [pool-1-thread-1] 2012-06-09 02:36:07,759 Memtable.java (line 177) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 85ms for 6236 columns INFO [MemoryMeter:1] 2012-07-21 09:04:47,614 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 8ms for 1 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,012 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 99ms for 1094 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,331 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 80ms for 242 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,856 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 505ms for 2678 columns INFO [MemoryMeter:1] 2012-07-21 09:04:52,881 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 776ms for 5236 columns INFO [MemoryMeter:1] 2012-07-21 09:04:52,945 Memtable.java (line 213)
Re: Cassandra 1.0 hangs during GC
Sun Java 6 didn't help it at all Sar shows no special activity on the long GC times And I have remembered that I do a lot of get_range_slices requests, each time on a range of 100, if this could be important. Still my GC times are growing incredible INFO [ScheduledTasks:1] 2012-07-25 12:17:04,029 GCInspector.java (line 122) GC for ParNew: 497253 ms for 1 collections, 114036128 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-25 13:26:21,282 GCInspector.java (line 122) GC for ParNew: 149508 ms for 1 collections, 87015440 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-25 18:39:44,654 GCInspector.java (line 122) GC for ParNew: 83584 ms for 1 collections, 54368032 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-25 22:45:42,448 GCInspector.java (line 122) GC for ParNew: 48454 ms for 1 collections, 44072488 used; max is 8464105472 24.07.2012, 13:07, Joost van de Wijgerd jwijg...@gmail.com: You are better off using Sun Java 6 to run Cassandra. In the past there were issues reported on 7. Can you try running it on Sun Java 6? kind regards Joost On Tue, Jul 24, 2012 at 10:04 AM, Nikolay Kоvshov nkovs...@yandex.ru wrote: 48 G of Ram on that machine, swap is not used. I will disable swap at all just in case I have 4 cassandra processes (parts of 4 different clusters), each allocated 8 GB and using 4 of them java -version java version 1.7.0 Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) 23.07.2012, 20:12, Joost van de Wijgerd jwijg...@gmail.com: Howmuch memory do you have on the machine. Seems like you have 8G reserved for the Cassandra java process, If this is all the memory on the machine you might be swapping. Also which jvm do you use? kind regards Joost On Mon, Jul 23, 2012 at 10:07 AM, Nikolay Kоvshov nkovs...@yandex.ru wrote: 21th I have mirgated to cassandra 1.1.2 but see no improvement cat /var/log/cassandra/Earth1.log | grep GC for INFO [ScheduledTasks:1] 2012-05-22 17:42:48,445 GCInspector.java (line 123) GC for ParNew: 345 ms for 1 collections, 82451888 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-05-23 02:47:13,911 GCInspector.java (line 123) GC for ParNew: 312 ms for 1 collections, 110617416 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-05-23 11:57:54,317 GCInspector.java (line 123) GC for ParNew: 298 ms for 1 collections, 98161920 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-02 08:52:37,019 GCInspector.java (line 123) GC for ParNew: 196886 ms for 1 collections, 2310058496 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-16 17:41:25,940 GCInspector.java (line 123) GC for ParNew: 200146 ms for 1 collections, 2345987088 used; max is 8464105472 === Migrated from 1.0.0 to 1.1.2 INFO [ScheduledTasks:1] 2012-07-21 09:05:08,280 GCInspector.java (line 122) GC for ParNew: 282 ms for 1 collections, 466406864 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-21 12:38:43,132 GCInspector.java (line 122) GC for ParNew: 233 ms for 1 collections, 405269504 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-22 02:29:09,596 GCInspector.java (line 122) GC for ParNew: 253 ms for 1 collections, 389700768 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-22 17:45:46,357 GCInspector.java (line 122) GC for ParNew: 57391 ms for 1 collections, 400083984 used; max is 8464105472 Memory and yaml memory-related settings are default I do not do deletes I have 2 CF's and no secondary indexes LiveRatio's: INFO [pool-1-thread-1] 2012-06-09 02:36:07,759 Memtable.java (line 177) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 85ms for 6236 columns INFO [MemoryMeter:1] 2012-07-21 09:04:47,614 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 8ms for 1 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,012 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 99ms for 1094 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,331 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 80ms for 242 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,856 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 505ms for 2678 columns INFO [MemoryMeter:1] 2012-07-21 09:04:52,881 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 776ms for 5236 columns INFO [MemoryMeter:1] 2012-07-21 09:04:52,945 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1')
Re: Creating counter columns in cassandra
Hi To create: ColumnFamilyDefinition counters = createBasicCfDef( KEYSPACE, Consts.COUNTERS, ComparatorType.UTF8TYPE, null, CounterColumnType, CompositeType(UTF8Type,UUIDType)); counters.setReplicateOnWrite(true); cluster.addColumnFamily(counters, true); to increment (add) counter public static void incrementCounter(Composite key, String columnName, long inc) { MutatorComposite mutator = HFactory.createMutator(keyspace, CompositeSerializer.get()); mutator.incrementCounter(key, Consts.COUNTERS, columnName, inc); mutator.execute(); } Regards, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Jul 25, 2012 at 8:24 PM, Amila Paranawithana amila1...@gmail.comwrote: Hi all, I want to create counter columns in a column family via a java module. These column families and counter columns need to be created dynamically. plz send me some example codes to refer. (with hector or any other method ) Thanks -- Amila Iroshani Paranawithana CSE-University of Moratuwa. B-http://amilaparanawithana.blogspot.com T-https://twitter.com/#!/AmilaPara tokLogo.png
Re: Creating counter columns in cassandra
You can check Astyanax API https://github.com/Netflix/astyanax/blob/5c05d118e22eef541a7a201adf7c1c610da13f5b/src/test/java/com/netflix/astyanax/thrift/ThrifeKeyspaceImplTest.java There are some counter column example which will surely help you.
How schema disagreement can be fixed faster on 1.0.10 cluster ?
Hi ! We got into schema disagreement situation on 1.0.10 having 250GB of compressed data per node. Following http://wiki.apache.org/cassandra/FAQ#schema_disagreement after node restart looks like it is replaying all schema changes one be one , right ? As we did a lot of them during cluster lifetime, now node is busy creating long time ago dropped secondary indexes which looks like gonna take hours. Can it be done faster ? 1. Can we move all data SStables out of data/*/ directories, 2. follow FAQ#schema_disagreement (it should be faster on no data node) until we reach schema agreement. 3. Than stop cassandra, 4. Copy files back. 5. Start cassandra. Will it work ? Extra option is to disable thrift during above process (can it be done in config ? In cassandra.yaml rpc_port: 0 ? ) Thanks in advance for any hints, regards, -- Mateusz Korniak
Questions regarding DataStax AMI
Hi! Is there a way to launch EC2 cluster from DataStax latest community AMI that will run Cassandra 1.0.8 and not 1.1.2? Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0
I saw this. All works fine upto version 1.1.0 the 0.8.x takes 5GB of memory of an 8GB machine the 1.0.x takes between 6 and 7 GB on a 8GB machine and the 1.1.0 takes all and it is a problem for me it is no solution to wait of the OOM-Killer from the linux kernel and restart the cassandraprocess when my machine has less then 100MB ram available then I have a problem. On 07/25/2012 07:06 PM, Tyler Hobbs wrote: Are you actually seeing any problems from this? High virtual memory usage on its own really doesn't mean anything. See http://wiki.apache.org/cassandra/FAQ#mmap On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler thomas.speng...@toptarif.de wrote: No one has any idea? we tryed update to 1.1.2 DiskAccessMode standard, indexAccessMode standard row_cache_size_in_mb: 0 key_cache_size_in_mb: 0 Our next try will to change SerializingCacheProvider to ConcurrentLinkedHashCacheProvider any other proposals are welcom On 07/04/2012 02:13 PM, Thomas Spengler wrote: Hi @all, since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage of the cassandra-nodes explodes our setup is: * 5 - centos 5.8 nodes * each 4 CPU's and 8 GB RAM * each node holds about 100 GB on data * each jvm's uses 2GB Ram * DiskAccessMode is standard, indexAccessMode is standard The memory usage grows upto the whole memory is used. Just for information, as we had cassandra 1.0.3, we used * DiskAccessMode is standard, indexAccessMode is mmap * and the ram-usage was ~4GB can anyone help? With Regards -- Thomas Spengler Chief Technology Officer TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin Tel.: (030) 2000912 0 | Fax: (030) 2000912 100 thomas.speng...@toptarif.de | www.toptarif.de Amtsgericht Charlottenburg, HRB 113287 B Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor - -- Thomas Spengler Chief Technology Officer TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin Tel.: (030) 2000912 0 | Fax: (030) 2000912 100 thomas.speng...@toptarif.de | www.toptarif.de Amtsgericht Charlottenburg, HRB 113287 B Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor -
Re: Creating counter columns in cassandra
Check out Kundera for Counter column support. Here is the link for Counter column tests: https://github.com/impetus-opensource/Kundera/tree/kundera-2.0.7/kundera-cassandra/src/test/java/com/impetus/client/crud/countercolumns -Vivek On Thu, Jul 26, 2012 at 12:27 PM, Abhijit Chanda abhijit.chan...@gmail.comwrote: You can check Astyanax API https://github.com/Netflix/astyanax/blob/5c05d118e22eef541a7a201adf7c1c610da13f5b/src/test/java/com/netflix/astyanax/thrift/ThrifeKeyspaceImplTest.java There are some counter column example which will surely help you.
restoring a counter
According to this post[1], one's supposed to start C* with -Dcassandra.renew_counter_id=true as one of the steps of restoring a counter column family. I have two questions related to this: a) how does that setting affect C* in a non-restoring start? b) if it's bad (for some value of that), should I stop C*+remove the setting+start C* after the value has been repaired? c) bonus question: wouldn't it be nice to change the init.d script to be able to add this kind of one-time settings? -- [1] http://www.datastax.com/dev/blog/whats-new-in-cassandra-0-8-part-2-counters -- Marcos Dione SysAdmin Astek Sud-Est pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo 04 97 12 62 45 - mdione@orange.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you.
RE: restoring a counter
De : mdione@orange.com [mailto:mdione@orange.com] restoring a counter column family. I have two questions related to this: a) how does that setting affect C* in a non-restoring start? b) if it's bad (for some value of that), should I stop C*+remove the setting+start C* after the value has been repaired? c) bonus question: wouldn't it be nice to change the init.d script to be able to add this kind of one-time settings? And a fourth one: d) how would you restore from a full cluster crash? Assuming that I have snapshots done at ~the same time. _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you.
is upgradesstables required (or recommended) upon update column family ?
Hello! is upgradesstables required upon update column family with compression_options (or compaction_strategy) ? Cheers, Ilya Shipitsin
Re: Questions regarding DataStax AMI
Yes, you can easily do this by using the --release version switch as found here: http://www.datastax.com/docs/1.0/install/install_ami Thanks, Joaquin Casares DataStax Software Engineer/Support On Thu, Jul 26, 2012 at 12:44 AM, Tamar Fraenkel ta...@tok-media.comwrote: Hi! Is there a way to launch EC2 cluster from DataStax latest community AMI that will run Cassandra 1.0.8 and not 1.1.2? Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Re: Adding new node to clusters with PropertyFileSnitch
On Thu, Jul 26, 2012 at 7:35 AM, Michael Cherkasov michael.cherka...@gmail.com wrote: Hi all, if my clusters have PropertyFileSnitch and I'm going to add nee node, does that mean that new node must be added to ALL cassandra-topology.properties files Yes, unless that node's dc and rack happen to match the default dc and rack. , and all nodes(in all clusters) must be restarted? No, you don't have to restart any nodes. They re-read cassandra-topology.properties periodicaly (every five minutes, I think). -- Tyler Hobbs DataStax http://datastax.com/
Re: Cassandra 1.0 hangs during GC
On Thu, Jul 26, 2012 at 1:25 AM, Nikolay Kоvshov nkovs...@yandex.ru wrote: And I have remembered that I do a lot of get_range_slices requests, each time on a range of 100, if this could be important. If you have large rows, try lowering the number of rows you fetch at once from 100 to 25. Pulling in a large amount of short-lived data could explain the long parnews. -- Tyler Hobbs DataStax http://datastax.com/
Re: How schema disagreement can be fixed faster on 1.0.10 cluster ?
I know you specified 1.0.10, but C* 1.1 solves this problem: http://www.datastax.com/dev/blog/the-schema-management-renaissance On Thu, Jul 26, 2012 at 7:29 AM, Mateusz Korniak mateusz-li...@ant.gliwice.pl wrote: Hi ! We got into schema disagreement situation on 1.0.10 having 250GB of compressed data per node. Following http://wiki.apache.org/cassandra/FAQ#schema_disagreement after node restart looks like it is replaying all schema changes one be one , right ? As we did a lot of them during cluster lifetime, now node is busy creating long time ago dropped secondary indexes which looks like gonna take hours. Can it be done faster ? 1. Can we move all data SStables out of data/*/ directories, 2. follow FAQ#schema_disagreement (it should be faster on no data node) until we reach schema agreement. 3. Than stop cassandra, 4. Copy files back. 5. Start cassandra. Will it work ? Extra option is to disable thrift during above process (can it be done in config ? In cassandra.yaml rpc_port: 0 ? ) Thanks in advance for any hints, regards, -- Mateusz Korniak -- Tyler Hobbs DataStax http://datastax.com/
Re: restoring a counter
mdione@orange.com writes: restoring a counter column family. I have two questions related to this: a) how does that setting affect C* in a non-restoring start? renew_counter_id regenerates a new NodeId for the cassandra VM which is used to keep track of the counter shards the node holds. If you regenerate node ids on each restart, you will most likely corrupt your counter data. b) if it's bad (for some value of that), should I stop C*+remove the setting+start C* after the value has been repaired? This is not necessary, provided you run repair once cassandra has restarted c) bonus question: wouldn't it be nice to change the init.d script to be able to add this kind of one-time settings? This is possible through environment variables: env JVM_OPTS=-Dcassandra.renew_counter_id=true service cassandra start (at least on the init script which comes with the cassandra debian package). And a fourth one: d) how would you restore from a full cluster crash? Assuming that I have snapshots done at ~the same time. If you want point-in-time restore, reprovisioning snapshots should suffice. The node id regeneration is supposed to be used in scenarios where data corruption occured.
How to manually build and maintain secondary indexes
Hello, My company is working on transition of our relational data model to Cassandra. Naturally, one of the basic demands is to have secondary indexes to answer queries quickly according to the application's needs. After looking at Cassandra's native support for secondary indexes, we decided not to use them due to the poor performance for high-cardinality values. Instead, we decide to implement secondary indexes manually. Some search led us to http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html which details a schema for such indexes. However, the method employed there specifically adds an index entries column family, whereas it seems like only 2 CFs are needed - one for the items and one for the indexes (assuming one has access to both old and new values when updating an item). The article actually mentioned that this is indeed not the obvious solution, for a number of reasons related to Cassandra's model of eventual consistency ... will not reliably work and it's a really good idea to make sure you understand why this CF is necessary. However, no additional information is provided on what might be a critical issue, as dealing with corrupt indexes in a large production environment is surely to be a nightmare. What are the community's thoughts on this matter? Given the writer's credentials in the Cassandra realm, specifically regarding indexes, I'm inclined not to ignore his remarks. References to a document / system that implement similar indexes would be greatly appreciated as well. - alon
Re: How to manually build and maintain secondary indexes
Alon, We came to the same conclusion regarding secondary indexes, and instead of using them we implemented our own wide-row indexing capability and open-sourced it. Its available here: https://github.com/hmsonline/cassandra-indexing We still have challenges rebuilding indexes, etc. It doesn't address all of your concerns, but I tried to capture the motivation behind our implementation here: http://brianoneill.blogspot.com/2012/03/cassandra-indexing-good-bad-and-ugl y.html -brian -- Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p: 215.588.6024 www.healthmarketscience.com On 7/26/12 2:05 PM, Alon Pilberg alo...@taboola.com wrote: Hello, My company is working on transition of our relational data model to Cassandra. Naturally, one of the basic demands is to have secondary indexes to answer queries quickly according to the application's needs. After looking at Cassandra's native support for secondary indexes, we decided not to use them due to the poor performance for high-cardinality values. Instead, we decide to implement secondary indexes manually. Some search led us to http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html which details a schema for such indexes. However, the method employed there specifically adds an index entries column family, whereas it seems like only 2 CFs are needed - one for the items and one for the indexes (assuming one has access to both old and new values when updating an item). The article actually mentioned that this is indeed not the obvious solution, for a number of reasons related to Cassandra's model of eventual consistency ... will not reliably work and it's a really good idea to make sure you understand why this CF is necessary. However, no additional information is provided on what might be a critical issue, as dealing with corrupt indexes in a large production environment is surely to be a nightmare. What are the community's thoughts on this matter? Given the writer's credentials in the Cassandra realm, specifically regarding indexes, I'm inclined not to ignore his remarks. References to a document / system that implement similar indexes would be greatly appreciated as well. - alon
Re: How to manually build and maintain secondary indexes
http://www.anuff.com/2011/02/indexing-in-cassandra.html On Thu, Jul 26, 2012 at 11:43 PM, Brian O'Neill boneil...@gmail.com wrote: Alon, We came to the same conclusion regarding secondary indexes, and instead of using them we implemented our own wide-row indexing capability and open-sourced it. Its available here: https://github.com/hmsonline/cassandra-indexing We still have challenges rebuilding indexes, etc. It doesn't address all of your concerns, but I tried to capture the motivation behind our implementation here: http://brianoneill.blogspot.com/2012/03/cassandra-indexing-good-bad-and-ugl y.html -brian -- Brian O'Neill Lead Architect, Software Development Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 p: 215.588.6024 www.healthmarketscience.com On 7/26/12 2:05 PM, Alon Pilberg alo...@taboola.com wrote: Hello, My company is working on transition of our relational data model to Cassandra. Naturally, one of the basic demands is to have secondary indexes to answer queries quickly according to the application's needs. After looking at Cassandra's native support for secondary indexes, we decided not to use them due to the poor performance for high-cardinality values. Instead, we decide to implement secondary indexes manually. Some search led us to http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html which details a schema for such indexes. However, the method employed there specifically adds an index entries column family, whereas it seems like only 2 CFs are needed - one for the items and one for the indexes (assuming one has access to both old and new values when updating an item). The article actually mentioned that this is indeed not the obvious solution, for a number of reasons related to Cassandra's model of eventual consistency ... will not reliably work and it's a really good idea to make sure you understand why this CF is necessary. However, no additional information is provided on what might be a critical issue, as dealing with corrupt indexes in a large production environment is surely to be a nightmare. What are the community's thoughts on this matter? Given the writer's credentials in the Cassandra realm, specifically regarding indexes, I'm inclined not to ignore his remarks. References to a document / system that implement similar indexes would be greatly appreciated as well. - alon -- Rajat Mathur Software Developer I Flipkart Online Services Pvt Ltd rajat.mat...@flipkart.com +91-9916081216
Re: Questions regarding DataStax AMI
What should be the value to create it with Cassandra 1.0.8 Tamar Sent from my iPod On Jul 26, 2012, at 7:06 PM, Joaquin Casares joaq...@datastax.com wrote: Yes, you can easily do this by using the --release version switch as found here: http://www.datastax.com/docs/1.0/install/install_ami Thanks, Joaquin Casares DataStax Software Engineer/Support On Thu, Jul 26, 2012 at 12:44 AM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! Is there a way to launch EC2 cluster from DataStax latest community AMI that will run Cassandra 1.0.8 and not 1.1.2? Thanks Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956
Re: Bringing a dead node back up after fixing hardware issues
On Wed, Jul 25, 2012 at 6:16 PM, Eran Chinthaka Withana eran.chinth...@gmail.com wrote: Alright, lets assume I want to go on this route. I have RF=2 in the data center and I believe I need at least RF=3 to set the replication to LOCAL_QUORUM and hide the node failures. But if I increase the RF to 3 now then won't it trigger more read misses until repair completes? Given this is a production cluster which can not afford downtime, how can we do this? Switch to LQ and increase the RF to 3, then repair to actually have the RF bumped up. As long as nothing fails during the first step (which should take perhaps minutes) you'll be ok. -Brandon
Re: Counters values are less than expected [1.0.6 - Java/Pelops]
Dne 19.7.2012 15:07, cbert...@libero.it napsal(a): Hi all, I have a problem with counters I'd like to solve before going in production. I have also similar problem with counters, but i do no think that something can be done with it. Developers are not interested in discovering what is wrong and i do not have exact steps for re-creating problem.
increased RF and repair, not working?
I am using Cassandra 1.0.2, have a 3 nodes cluster. the consistency level of read write are both QUORUM. At first the RF=1, and I figured that one node down will cause the cluster unusable. so I changed RF to 2, and run nodetool repair on every node(actually I did it twice). After the operation I think my data should be in at least two nodes, and it would be okay if one of them is down. But when I tried to simulate the failure, by disablegossip of one node, and the cluster knows this node is down. then access data from the cluster, it returned MaximumRetryException(pycassa). as my experiences this is caused by UnavailableException, which is means the data it is requesting is on a node which is down. so I wonder my data might not be replicated right, what should I do? thanks for the help! here is the keyspace info: * * *Keyspace: comments:* * Replication Strategy: org.apache.cassandra.locator.SimpleStrategy* * Durable Writes: true* *Options: [replication_factor:2]* the scheme version is okay: *[default@unknown] describe cluster;* *Cluster Information:* * Snitch: org.apache.cassandra.locator.SimpleSnitch* * Partitioner: org.apache.cassandra.dht.RandomPartitioner* * Schema versions: * * f67d0d50-b923-11e1--4f7cf9240aef: [192.168.1.129, 192.168.1.40, 192.168.1.50]* the loads are as below: *nodetool -h localhost ring* *Address DC RackStatus State Load OwnsToken * * 113427455640312821154458202477256070484 * *192.168.1.50datacenter1 rack1 Up Normal 28.77 GB 33.33% 0 * *192.168.1.40datacenter1 rack1 Up Normal 26.67 GB 33.33% 56713727820156410577229101238628035242 * *192.168.1.129 datacenter1 rack1 Up Normal 33.25 GB 33.33% 113427455640312821154458202477256070484*
Re: Connection issue in Cassandra
I used Cassandra 0.8.1 and pycasa 0.2. If I upgrade pycasa, then it have compatibility issue. please suggest Thanks Regards *Adeel**Akbar* On 7/25/2012 10:13 PM, Tyler Hobbs wrote: That's a pretty old version of pycassa; it was release before 0.7.0 came out. I suggest upgrading. It's possible this was caused by an old bug, but in general, this indicates that you have more threads trying to use the ConnectionPool concurrently than there are connections. On Wed, Jul 25, 2012 at 3:30 AM, Adeel Akbar adeel.ak...@panasiangroup.com mailto:adeel.ak...@panasiangroup.com wrote: Hi, I have created 2 node cluster and use with application. My application unable to connect with database. Please find below logs; NoConnectionAvailable at / ConnectionPool limit of size 2 overflow 2 reached, unable to obtain connection after 30 seconds Request Method: GET Request URL:http://172.16.100.131/ Django Version: 1.4 Exception Type: NoConnectionAvailable Exception Value: ConnectionPool limit of size 2 overflow 2 reached, unable to obtain connection after 30 seconds Exception Location: /usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/pool.py in get, line 738 Python Executable: /usr/local/bin/python Python Version: 2.6.4 Python Path: ['/var/www/bs_ping', '/usr/local/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg', '/usr/local/lib/python2.6/site-packages/amqplib-0.6.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/BeautifulSoup-3.1.0.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/python_dateutil-1.4.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/feedparser-4.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/python_twitter-0.6-py2.6.egg', '/usr/local/lib/python2.6/site-packages/simplejson-2.0.9-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/txAMQP-0.3-py2.6.egg', '/usr/local/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/zope.interface-3.5.2-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/UnicodeUtils-0.3.2-py2.6.egg', '/usr/local/lib/python2.6/site-packages/pytz-2009p-py2.6.egg', '/usr/local/lib/python2.6/site-packages/ScriptUtils-0.5.5-py2.6.egg', '/usr/local/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/python_memcached-1.44-py2.6.egg', '/usr/local/lib/python2.6/site-packages/coverage-3.2b1-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/flup-1.0.3.dev_20091027-py2.6.egg', '/usr/local/lib/python2.6/site-packages/oauth-1.0.1-py2.6.egg', '/usr/local/lib/python2.6/site-packages/pyOpenSSL-0.10-py2.6-linux-i686.egg', '/usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg', '/usr/local/lib/python2.6/site-packages/wadofstuff_django_serializers-1.1.0-py2.6.egg', '/usr/local/lib/python2.6/site-packages/jsonpickle-0.4.0-py2.6.egg', '/usr/local/lib/python2.6/site-packages/django_compressor-1.1.2-py2.6.egg', '/usr/local/lib/python2.6/site-packages/django_appconf-0.5-py2.6.egg', '/usr/local/lib/python26.zip', '/usr/local/lib/python2.6', '/usr/local/lib/python2.6/plat-linux2', '/usr/local/lib/python2.6/lib-tk', '/usr/local/lib/python2.6/lib-old', '/usr/local/lib/python2.6/lib-dynload', '/usr/local/lib/python2.6/site-packages', '/usr/local/lib/python2.6/site-packages/PIL', '/var/www/bs_ping/', '/var/www'] Server time:Wed, 25 Jul 2012 13:17:33 +0500 -- Thanks Regards *Adeel**Akbar* -- Tyler Hobbs DataStax http://datastax.com/
Re: Schema question : Query to support Find which all of these 500 email ids have been registered
In general I believe wide rows (many cols ) are preferable to skinny rows (many rows) so that you can get all the information in 1 go, One can store 2 billion cols in a row. However, on what basis would you store the 500 email ids in 1 row? What can be the row key? For e.g. If the query you want to answer with this column family is 'how many email addresses are registered in this application?', then application id can be a row key, and 500 email ids can be stored as columns. Each other applications would be another row . Since you want to search by application this may be the best approach. If your information doesn't fit neatly into the model above, you can go for An email id as a row key, and list of applications as columns. Reading 500 rows does not seem a big task - I doubt it would be a performance issue given cassandra's powers. On 27/07/12 11:12 AM, Aklin_81 asdk...@gmail.com wrote: I need to find out what all email ids among a list of 500 ids passed in a single query, have been registered on my app. (Total registered email ids may be in millions). What is the best way to store this kind of data? Should I store each email id in a separate row ? But then I would have to read 500 rows at a single time ! Or if I use single row or less no of rows then they would get too heavy. Btw Would it be really bad if I read 500 rows at a single time, they'll be just 1 column rows never modified once written columns. This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***