Fwd: Cassandra 1.0 hangs during GC

2012-07-26 Thread Nikolay Kоvshov
Sun Java 6 didn't help it at all

Sar shows no special activity on the long GC times

And I have remembered that I do a lot of get_range_slices requests, each time 
on a range of 100, if this could be important.

Still my GC times are growing incredible

 INFO [ScheduledTasks:1] 2012-07-25 12:17:04,029 GCInspector.java (line 122) GC 
for ParNew: 497253 ms for 1 collections, 114036128 used; max is 8464105472
 INFO [ScheduledTasks:1] 2012-07-25 13:26:21,282 GCInspector.java (line 122) GC 
for ParNew: 149508 ms for 1 collections, 87015440 used; max is 8464105472
 INFO [ScheduledTasks:1] 2012-07-25 18:39:44,654 GCInspector.java (line 122) GC 
for ParNew: 83584 ms for 1 collections, 54368032 used; max is 8464105472
 INFO [ScheduledTasks:1] 2012-07-25 22:45:42,448 GCInspector.java (line 122) GC 
for ParNew: 48454 ms for 1 collections, 44072488 used; max is 8464105472

24.07.2012, 13:07, Joost van de Wijgerd jwijg...@gmail.com:

  You are better off using Sun Java 6 to run Cassandra. In the past
  there were issues reported on 7. Can you try running it on Sun Java 6?

  kind regards

  Joost

  On Tue, Jul 24, 2012 at 10:04 AM, Nikolay Kоvshov nkovs...@yandex.ru wrote:
   48 G of Ram on that machine, swap is not used. I will disable swap at all 
 just in case
   I have 4 cassandra processes (parts of 4 different clusters), each 
 allocated 8 GB and using 4 of them
  java -version
   java version 1.7.0
   Java(TM) SE Runtime Environment (build 1.7.0-b147)
   Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)

   23.07.2012, 20:12, Joost van de Wijgerd jwijg...@gmail.com:
   Howmuch memory do you have on the machine. Seems like you have 8G
   reserved for the Cassandra java process, If this is all the memory on
   the machine you might be swapping. Also which jvm do you use?

   kind regards

   Joost

   On Mon, Jul 23, 2012 at 10:07 AM, Nikolay Kоvshov nkovs...@yandex.ru 
 wrote:
 21th I have mirgated to cassandra 1.1.2 but see no improvement

    cat /var/log/cassandra/Earth1.log | grep GC for
    INFO [ScheduledTasks:1] 2012-05-22 17:42:48,445 GCInspector.java (line 
 123) GC for ParNew: 345 ms for 1 collections, 82451888 used; max is 
 8464105472
    INFO [ScheduledTasks:1] 2012-05-23 02:47:13,911 GCInspector.java (line 
 123) GC for ParNew: 312 ms for 1 collections, 110617416 used; max is 
 8464105472
    INFO [ScheduledTasks:1] 2012-05-23 11:57:54,317 GCInspector.java (line 
 123) GC for ParNew: 298 ms for 1 collections, 98161920 used; max is 
 8464105472
    INFO [ScheduledTasks:1] 2012-07-02 08:52:37,019 GCInspector.java (line 
 123) GC for ParNew: 196886 ms for 1 collections, 2310058496 used; max is 
 8464105472
    INFO [ScheduledTasks:1] 2012-07-16 17:41:25,940 GCInspector.java (line 
 123) GC for ParNew: 200146 ms for 1 collections, 2345987088 used; max is 
 8464105472
    === Migrated from 1.0.0 to 1.1.2
    INFO [ScheduledTasks:1] 2012-07-21 09:05:08,280 GCInspector.java (line 
 122) GC for ParNew: 282 ms for 1 collections, 466406864 used; max is 
 8464105472
    INFO [ScheduledTasks:1] 2012-07-21 12:38:43,132 GCInspector.java (line 
 122) GC for ParNew: 233 ms for 1 collections, 405269504 used; max is 
 8464105472
    INFO [ScheduledTasks:1] 2012-07-22 02:29:09,596 GCInspector.java (line 
 122) GC for ParNew: 253 ms for 1 collections, 389700768 used; max is 
 8464105472
    INFO [ScheduledTasks:1] 2012-07-22 17:45:46,357 GCInspector.java (line 
 122) GC for ParNew: 57391 ms for 1 collections, 400083984 used; max is 
 8464105472

    Memory and yaml memory-related settings are default
    I do not do deletes
    I have 2 CF's and no secondary indexes

    LiveRatio's:
 INFO [pool-1-thread-1] 2012-06-09 02:36:07,759 Memtable.java (line 
 177) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 85ms for 6236 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:47,614 Memtable.java (line 213) 
 CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 8ms for 1 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:51,012 Memtable.java (line 213) 
 CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 99ms for 1094 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:51,331 Memtable.java (line 213) 
 CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 80ms for 242 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:51,856 Memtable.java (line 213) 
 CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 505ms for 2678 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:52,881 Memtable.java (line 213) 
 CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 776ms for 5236 columns
 INFO [MemoryMeter:1] 2012-07-21 09:04:52,945 Memtable.java (line 213) 
 

Re: Cassandra 1.0 hangs during GC

2012-07-26 Thread Nikolay Kоvshov
Sun Java 6 didn't help it at all

Sar shows no special activity on the long GC times

And I have remembered that I do a lot of get_range_slices requests, each time 
on a range of 100, if this could be important.

Still my GC times are growing incredible

 INFO [ScheduledTasks:1] 2012-07-25 12:17:04,029 GCInspector.java (line 122) GC 
for ParNew: 497253 ms for 1 collections, 114036128 used; max is 8464105472
 INFO [ScheduledTasks:1] 2012-07-25 13:26:21,282 GCInspector.java (line 122) GC 
for ParNew: 149508 ms for 1 collections, 87015440 used; max is 8464105472
 INFO [ScheduledTasks:1] 2012-07-25 18:39:44,654 GCInspector.java (line 122) GC 
for ParNew: 83584 ms for 1 collections, 54368032 used; max is 8464105472
 INFO [ScheduledTasks:1] 2012-07-25 22:45:42,448 GCInspector.java (line 122) GC 
for ParNew: 48454 ms for 1 collections, 44072488 used; max is 8464105472


24.07.2012, 13:07, Joost van de Wijgerd jwijg...@gmail.com:
 You are better off using Sun Java 6 to run Cassandra. In the past
 there were issues reported on 7. Can you try running it on Sun Java 6?

 kind regards

 Joost

 On Tue, Jul 24, 2012 at 10:04 AM, Nikolay Kоvshov nkovs...@yandex.ru wrote:

  48 G of Ram on that machine, swap is not used. I will disable swap at all 
 just in case
  I have 4 cassandra processes (parts of 4 different clusters), each 
 allocated 8 GB and using 4 of them
 java -version
  java version 1.7.0
  Java(TM) SE Runtime Environment (build 1.7.0-b147)
  Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)

  23.07.2012, 20:12, Joost van de Wijgerd jwijg...@gmail.com:
  Howmuch memory do you have on the machine. Seems like you have 8G
  reserved for the Cassandra java process, If this is all the memory on
  the machine you might be swapping. Also which jvm do you use?

  kind regards

  Joost

  On Mon, Jul 23, 2012 at 10:07 AM, Nikolay Kоvshov nkovs...@yandex.ru 
 wrote:
    21th I have mirgated to cassandra 1.1.2 but see no improvement

   cat /var/log/cassandra/Earth1.log | grep GC for
   INFO [ScheduledTasks:1] 2012-05-22 17:42:48,445 GCInspector.java (line 
 123) GC for ParNew: 345 ms for 1 collections, 82451888 used; max is 
 8464105472
   INFO [ScheduledTasks:1] 2012-05-23 02:47:13,911 GCInspector.java (line 
 123) GC for ParNew: 312 ms for 1 collections, 110617416 used; max is 
 8464105472
   INFO [ScheduledTasks:1] 2012-05-23 11:57:54,317 GCInspector.java (line 
 123) GC for ParNew: 298 ms for 1 collections, 98161920 used; max is 
 8464105472
   INFO [ScheduledTasks:1] 2012-07-02 08:52:37,019 GCInspector.java (line 
 123) GC for ParNew: 196886 ms for 1 collections, 2310058496 used; max is 
 8464105472
   INFO [ScheduledTasks:1] 2012-07-16 17:41:25,940 GCInspector.java (line 
 123) GC for ParNew: 200146 ms for 1 collections, 2345987088 used; max is 
 8464105472
   === Migrated from 1.0.0 to 1.1.2
   INFO [ScheduledTasks:1] 2012-07-21 09:05:08,280 GCInspector.java (line 
 122) GC for ParNew: 282 ms for 1 collections, 466406864 used; max is 
 8464105472
   INFO [ScheduledTasks:1] 2012-07-21 12:38:43,132 GCInspector.java (line 
 122) GC for ParNew: 233 ms for 1 collections, 405269504 used; max is 
 8464105472
   INFO [ScheduledTasks:1] 2012-07-22 02:29:09,596 GCInspector.java (line 
 122) GC for ParNew: 253 ms for 1 collections, 389700768 used; max is 
 8464105472
   INFO [ScheduledTasks:1] 2012-07-22 17:45:46,357 GCInspector.java (line 
 122) GC for ParNew: 57391 ms for 1 collections, 400083984 used; max is 
 8464105472

   Memory and yaml memory-related settings are default
   I do not do deletes
   I have 2 CF's and no secondary indexes

   LiveRatio's:
    INFO [pool-1-thread-1] 2012-06-09 02:36:07,759 Memtable.java (line 177) 
 CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 85ms for 6236 columns
    INFO [MemoryMeter:1] 2012-07-21 09:04:47,614 Memtable.java (line 213) 
 CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 8ms for 1 columns
    INFO [MemoryMeter:1] 2012-07-21 09:04:51,012 Memtable.java (line 213) 
 CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 99ms for 1094 columns
    INFO [MemoryMeter:1] 2012-07-21 09:04:51,331 Memtable.java (line 213) 
 CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 80ms for 242 columns
    INFO [MemoryMeter:1] 2012-07-21 09:04:51,856 Memtable.java (line 213) 
 CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 505ms for 2678 columns
    INFO [MemoryMeter:1] 2012-07-21 09:04:52,881 Memtable.java (line 213) 
 CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 
 (just-counted was 1.0).  calculation took 776ms for 5236 columns
    INFO [MemoryMeter:1] 2012-07-21 09:04:52,945 Memtable.java (line 213) 
 CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') 

Re: Creating counter columns in cassandra

2012-07-26 Thread Tamar Fraenkel
Hi
To create:

ColumnFamilyDefinition counters = createBasicCfDef(
KEYSPACE, Consts.COUNTERS, ComparatorType.UTF8TYPE,
null, CounterColumnType, CompositeType(UTF8Type,UUIDType));
counters.setReplicateOnWrite(true);
cluster.addColumnFamily(counters, true);

to increment (add) counter

  public static void incrementCounter(Composite key,
  String columnName, long inc) {
MutatorComposite mutator =
HFactory.createMutator(keyspace,
CompositeSerializer.get());
mutator.incrementCounter(key,
Consts.COUNTERS, columnName, inc);
mutator.execute();
  }

Regards,

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Wed, Jul 25, 2012 at 8:24 PM, Amila Paranawithana amila1...@gmail.comwrote:


 Hi all,

 I want to create counter columns in a column family via a java module.
 These column families and counter columns need to be created dynamically.
 plz send me some example codes to refer. (with hector or any  other method
 )

 Thanks
 --
 Amila Iroshani Paranawithana
 CSE-University of Moratuwa.
 B-http://amilaparanawithana.blogspot.com
 T-https://twitter.com/#!/AmilaPara



tokLogo.png

Re: Creating counter columns in cassandra

2012-07-26 Thread Abhijit Chanda
You can check Astyanax API
https://github.com/Netflix/astyanax/blob/5c05d118e22eef541a7a201adf7c1c610da13f5b/src/test/java/com/netflix/astyanax/thrift/ThrifeKeyspaceImplTest.java
There are some counter column example which will surely help you.


How schema disagreement can be fixed faster on 1.0.10 cluster ?

2012-07-26 Thread Mateusz Korniak
Hi !
We got into schema disagreement situation on 1.0.10 having 250GB of compressed 
data per node.

Following
http://wiki.apache.org/cassandra/FAQ#schema_disagreement
after node restart looks like it is replaying all schema changes one be one , 
right ? 
As we did a lot of them during cluster lifetime, now node is busy creating 
long time ago dropped secondary indexes which looks like gonna take hours.
Can it be done faster ?

1. Can we move all data SStables out of data/*/ directories,
2. follow FAQ#schema_disagreement (it should be faster on no data node) until 
we reach schema agreement. 
3. Than stop cassandra,
4. Copy files back.
5. Start cassandra.


Will it work ?

Extra option is to disable thrift during above process (can it be done in 
config ? In cassandra.yaml rpc_port: 0 ? )



Thanks in advance for any hints, regards,

-- 
Mateusz Korniak


Questions regarding DataStax AMI

2012-07-26 Thread Tamar Fraenkel
Hi!
Is there a way to launch EC2 cluster from DataStax latest community AMI
that will run Cassandra 1.0.8 and not 1.1.2?
Thanks
*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
tokLogo.png

Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-07-26 Thread Thomas Spengler
I saw this.

All works fine upto version 1.1.0
the 0.8.x takes 5GB of memory of an 8GB machine
the 1.0.x takes between 6 and 7 GB on a 8GB machine
and
the 1.1.0 takes all

and it is a problem
for me it is no solution to wait of the OOM-Killer from the linux kernel
and restart the cassandraprocess

when my machine has less then 100MB ram available then I have a problem.



On 07/25/2012 07:06 PM, Tyler Hobbs wrote:
 Are you actually seeing any problems from this? High virtual memory usage
 on its own really doesn't mean anything. See
 http://wiki.apache.org/cassandra/FAQ#mmap
 
 On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler 
 thomas.speng...@toptarif.de wrote:
 
 No one has any idea?

 we tryed

 update to 1.1.2
 DiskAccessMode standard, indexAccessMode standard
 row_cache_size_in_mb: 0
 key_cache_size_in_mb: 0


 Our next try will to change

 SerializingCacheProvider to ConcurrentLinkedHashCacheProvider

 any other proposals are welcom

 On 07/04/2012 02:13 PM, Thomas Spengler wrote:
 Hi @all,

 since our upgrade form cassandra 1.0.3 to 1.1.0 the virtual memory usage
 of the cassandra-nodes explodes

 our setup is:
 * 5 - centos 5.8 nodes
 * each 4 CPU's and 8 GB RAM
 * each node holds about 100 GB on data
 * each jvm's uses 2GB Ram
 * DiskAccessMode is standard, indexAccessMode is standard

 The memory usage grows upto the whole memory is used.

 Just for information, as we had cassandra 1.0.3, we used
 * DiskAccessMode is standard, indexAccessMode is mmap
 * and the ram-usage was ~4GB


 can anyone help?


 With Regards



 --
 Thomas Spengler
 Chief Technology Officer
 

 TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
 Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
 thomas.speng...@toptarif.de | www.toptarif.de

 Amtsgericht Charlottenburg, HRB 113287 B
 Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
 -



 
 


-- 
Thomas Spengler
Chief Technology Officer


TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
thomas.speng...@toptarif.de | www.toptarif.de

Amtsgericht Charlottenburg, HRB 113287 B
Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
-




Re: Creating counter columns in cassandra

2012-07-26 Thread Vivek Mishra
Check out Kundera for Counter column support. Here is the link for Counter
column tests:

https://github.com/impetus-opensource/Kundera/tree/kundera-2.0.7/kundera-cassandra/src/test/java/com/impetus/client/crud/countercolumns


-Vivek

On Thu, Jul 26, 2012 at 12:27 PM, Abhijit Chanda
abhijit.chan...@gmail.comwrote:

 You can check Astyanax API

 https://github.com/Netflix/astyanax/blob/5c05d118e22eef541a7a201adf7c1c610da13f5b/src/test/java/com/netflix/astyanax/thrift/ThrifeKeyspaceImplTest.java
 There are some counter column example which will surely help you.



restoring a counter

2012-07-26 Thread mdione.ext
  According to this post[1], one's supposed to start C* with 
-Dcassandra.renew_counter_id=true as one of the steps of
restoring a counter column family. I have two questions related to this:

  a) how does that setting affect C* in a non-restoring start?

  b) if it's  bad  (for some value of that), should I stop C*+remove the 
setting+start C* after the value has been repaired?

  c) bonus question: wouldn't it be nice to change the init.d script to be able 
to add this kind of one-time settings?

--
[1] http://www.datastax.com/dev/blog/whats-new-in-cassandra-0-8-part-2-counters
--
Marcos Dione
SysAdmin
Astek Sud-Est
pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo
04 97 12 62 45 - mdione@orange.com



_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



RE: restoring a counter

2012-07-26 Thread mdione.ext
De : mdione@orange.com [mailto:mdione@orange.com]
 restoring a counter column family. I have two questions related to
 this:
 
   a) how does that setting affect C* in a non-restoring start?
 
   b) if it's  bad  (for some value of that), should I stop C*+remove
 the setting+start C* after the value has been repaired?
 
   c) bonus question: wouldn't it be nice to change the init.d script to
 be able to add this kind of one-time settings?

  And a fourth one:

  d) how would you restore from a full cluster crash? Assuming that I have 
snapshots done at ~the same time.



_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



is upgradesstables required (or recommended) upon update column family ?

2012-07-26 Thread Илья Шипицин
Hello!


is upgradesstables required upon update column family with
compression_options (or compaction_strategy) ?

Cheers,
Ilya Shipitsin


Re: Questions regarding DataStax AMI

2012-07-26 Thread Joaquin Casares
Yes, you can easily do this by using the --release version switch as
found here:
http://www.datastax.com/docs/1.0/install/install_ami

Thanks,

Joaquin Casares
DataStax
Software Engineer/Support



On Thu, Jul 26, 2012 at 12:44 AM, Tamar Fraenkel ta...@tok-media.comwrote:

 Hi!
 Is there a way to launch EC2 cluster from DataStax latest community AMI
 that will run Cassandra 1.0.8 and not 1.1.2?
 Thanks
 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956





tokLogo.png

Re: Adding new node to clusters with PropertyFileSnitch

2012-07-26 Thread Tyler Hobbs
On Thu, Jul 26, 2012 at 7:35 AM, Michael Cherkasov 
michael.cherka...@gmail.com wrote:

 Hi all, if my clusters have PropertyFileSnitch and I'm going to add nee
 node, does that mean that new node must be added to ALL
 cassandra-topology.properties files


Yes, unless that node's dc and rack happen to match the default dc and rack.


 , and all nodes(in all clusters) must be restarted?


No, you don't have to restart any nodes.  They re-read
cassandra-topology.properties periodicaly (every five minutes, I think).

-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Cassandra 1.0 hangs during GC

2012-07-26 Thread Tyler Hobbs
On Thu, Jul 26, 2012 at 1:25 AM, Nikolay Kоvshov nkovs...@yandex.ru wrote:


 And I have remembered that I do a lot of get_range_slices requests, each
 time on a range of 100, if this could be important.


If you have large rows, try lowering the number of rows you fetch at once
from 100 to 25.  Pulling in a large amount of short-lived data could
explain the long parnews.

-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: How schema disagreement can be fixed faster on 1.0.10 cluster ?

2012-07-26 Thread Tyler Hobbs
I know you specified 1.0.10, but C* 1.1 solves this problem:
http://www.datastax.com/dev/blog/the-schema-management-renaissance

On Thu, Jul 26, 2012 at 7:29 AM, Mateusz Korniak 
mateusz-li...@ant.gliwice.pl wrote:

 Hi !
 We got into schema disagreement situation on 1.0.10 having 250GB of
 compressed
 data per node.

 Following
 http://wiki.apache.org/cassandra/FAQ#schema_disagreement
 after node restart looks like it is replaying all schema changes one be
 one ,
 right ?
 As we did a lot of them during cluster lifetime, now node is busy creating
 long time ago dropped secondary indexes which looks like gonna take hours.
 Can it be done faster ?

 1. Can we move all data SStables out of data/*/ directories,
 2. follow FAQ#schema_disagreement (it should be faster on no data node)
 until
 we reach schema agreement.
 3. Than stop cassandra,
 4. Copy files back.
 5. Start cassandra.


 Will it work ?

 Extra option is to disable thrift during above process (can it be done in
 config ? In cassandra.yaml rpc_port: 0 ? )



 Thanks in advance for any hints, regards,

 --
 Mateusz Korniak




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: restoring a counter

2012-07-26 Thread Pierre-Yves Ritschard
mdione@orange.com writes:


 restoring a counter column family. I have two questions related to
 this:
 
   a) how does that setting affect C* in a non-restoring start?

renew_counter_id regenerates a new NodeId for the cassandra VM which
is used to keep track of the counter shards the node holds. If you
regenerate node ids on each restart, you will most likely corrupt your
counter data.

 
   b) if it's  bad  (for some value of that), should I stop C*+remove
 the setting+start C* after the value has been repaired?

This is not necessary, provided you run repair once cassandra has restarted

 
   c) bonus question: wouldn't it be nice to change the init.d script to
 be able to add this kind of one-time settings?


This is possible through environment variables: env
JVM_OPTS=-Dcassandra.renew_counter_id=true service cassandra start (at
least on the init script which comes with the cassandra debian package).

   And a fourth one:

   d) how would you restore from a full cluster crash? Assuming that I have 
 snapshots done at ~the same time.


If you want point-in-time restore, reprovisioning snapshots should
suffice. The node id regeneration is supposed to be used in scenarios
where data corruption occured.



How to manually build and maintain secondary indexes

2012-07-26 Thread Alon Pilberg
Hello,
My company is working on transition of our relational data model to
Cassandra. Naturally, one of the basic demands is to have secondary
indexes to answer queries quickly according to the application's
needs.
After looking at Cassandra's native support for secondary indexes, we
decided not to use them due to the poor performance for
high-cardinality values. Instead, we decide to implement secondary
indexes manually.
Some search led us to
http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html which
details a schema for such indexes. However, the method employed there
specifically adds an index entries column family, whereas it seems
like only 2 CFs are needed - one for the items and one for the indexes
(assuming one has access to both old and new values when updating an
item). The article actually mentioned that this is indeed not the
obvious solution, for a number of reasons related to Cassandra's
model of eventual consistency ... will not reliably work and it's a
really good idea to make sure you understand why this CF is
necessary. However, no additional information is provided on what
might be a critical issue, as dealing with corrupt indexes in a large
production environment is surely to be a nightmare.
What are the community's thoughts on this matter? Given the writer's
credentials in the Cassandra realm, specifically regarding indexes,
I'm inclined not to ignore his remarks.
References to a document / system that implement similar indexes would
be greatly appreciated as well.

- alon


Re: How to manually build and maintain secondary indexes

2012-07-26 Thread Brian O'Neill
Alon,

We came to the same conclusion regarding secondary indexes, and instead of
using them we implemented our own wide-row indexing capability and
open-sourced it.  

Its available here:
https://github.com/hmsonline/cassandra-indexing

We still have challenges rebuilding indexes, etc.  It doesn't address all
of your concerns, but I tried to capture the motivation behind our
implementation here:
http://brianoneill.blogspot.com/2012/03/cassandra-indexing-good-bad-and-ugl
y.html

-brian

-- 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024
www.healthmarketscience.com





On 7/26/12 2:05 PM, Alon Pilberg alo...@taboola.com wrote:

Hello,
My company is working on transition of our relational data model to
Cassandra. Naturally, one of the basic demands is to have secondary
indexes to answer queries quickly according to the application's
needs.
After looking at Cassandra's native support for secondary indexes, we
decided not to use them due to the poor performance for
high-cardinality values. Instead, we decide to implement secondary
indexes manually.
Some search led us to
http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html which
details a schema for such indexes. However, the method employed there
specifically adds an index entries column family, whereas it seems
like only 2 CFs are needed - one for the items and one for the indexes
(assuming one has access to both old and new values when updating an
item). The article actually mentioned that this is indeed not the
obvious solution, for a number of reasons related to Cassandra's
model of eventual consistency ... will not reliably work and it's a
really good idea to make sure you understand why this CF is
necessary. However, no additional information is provided on what
might be a critical issue, as dealing with corrupt indexes in a large
production environment is surely to be a nightmare.
What are the community's thoughts on this matter? Given the writer's
credentials in the Cassandra realm, specifically regarding indexes,
I'm inclined not to ignore his remarks.
References to a document / system that implement similar indexes would
be greatly appreciated as well.

- alon




Re: How to manually build and maintain secondary indexes

2012-07-26 Thread Rajat Mathur
http://www.anuff.com/2011/02/indexing-in-cassandra.html

On Thu, Jul 26, 2012 at 11:43 PM, Brian O'Neill boneil...@gmail.com wrote:

 Alon,

 We came to the same conclusion regarding secondary indexes, and instead of
 using them we implemented our own wide-row indexing capability and
 open-sourced it.

 Its available here:
 https://github.com/hmsonline/cassandra-indexing

 We still have challenges rebuilding indexes, etc.  It doesn't address all
 of your concerns, but I tried to capture the motivation behind our
 implementation here:
 http://brianoneill.blogspot.com/2012/03/cassandra-indexing-good-bad-and-ugl
 y.html

 -brian

 --
 Brian O'Neill
 Lead Architect, Software Development
 Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
 p: 215.588.6024
 www.healthmarketscience.com





 On 7/26/12 2:05 PM, Alon Pilberg alo...@taboola.com wrote:

 Hello,
 My company is working on transition of our relational data model to
 Cassandra. Naturally, one of the basic demands is to have secondary
 indexes to answer queries quickly according to the application's
 needs.
 After looking at Cassandra's native support for secondary indexes, we
 decided not to use them due to the poor performance for
 high-cardinality values. Instead, we decide to implement secondary
 indexes manually.
 Some search led us to
 http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html which
 details a schema for such indexes. However, the method employed there
 specifically adds an index entries column family, whereas it seems
 like only 2 CFs are needed - one for the items and one for the indexes
 (assuming one has access to both old and new values when updating an
 item). The article actually mentioned that this is indeed not the
 obvious solution, for a number of reasons related to Cassandra's
 model of eventual consistency ... will not reliably work and it's a
 really good idea to make sure you understand why this CF is
 necessary. However, no additional information is provided on what
 might be a critical issue, as dealing with corrupt indexes in a large
 production environment is surely to be a nightmare.
 What are the community's thoughts on this matter? Given the writer's
 credentials in the Cassandra realm, specifically regarding indexes,
 I'm inclined not to ignore his remarks.
 References to a document / system that implement similar indexes would
 be greatly appreciated as well.
 
 - alon





-- 
Rajat Mathur
Software Developer I
Flipkart Online Services Pvt Ltd
rajat.mat...@flipkart.com
+91-9916081216


Re: Questions regarding DataStax AMI

2012-07-26 Thread Tamar Fraenkel
What should be the value to create it with Cassandra 1.0.8
Tamar

Sent from my iPod

On Jul 26, 2012, at 7:06 PM, Joaquin Casares joaq...@datastax.com wrote:

 Yes, you can easily do this by using the --release version switch as found 
 here:
 http://www.datastax.com/docs/1.0/install/install_ami
 
 Thanks,
 
 Joaquin Casares
 DataStax
 Software Engineer/Support
 
 
 
 On Thu, Jul 26, 2012 at 12:44 AM, Tamar Fraenkel ta...@tok-media.com wrote:
 Hi!
 Is there a way to launch EC2 cluster from DataStax latest community AMI that 
 will run Cassandra 1.0.8 and not 1.1.2?
 Thanks
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 tokLogo.png
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956 
 
 
 
 
 


Re: Bringing a dead node back up after fixing hardware issues

2012-07-26 Thread Brandon Williams
On Wed, Jul 25, 2012 at 6:16 PM, Eran Chinthaka Withana
eran.chinth...@gmail.com wrote:

 Alright, lets assume I want to go on this route. I have RF=2 in the data
 center and I believe I need at least RF=3 to set the replication to
 LOCAL_QUORUM and hide the node failures. But if I increase the RF to 3 now
 then won't it trigger more read misses until repair completes? Given this is
 a production cluster which can not afford downtime, how can we do this?

Switch to LQ and increase the RF to 3, then repair to actually have
the RF bumped up.

As long as nothing fails during the first step (which should take
perhaps minutes) you'll be ok.

-Brandon


Re: Counters values are less than expected [1.0.6 - Java/Pelops]

2012-07-26 Thread Radim Kolar

Dne 19.7.2012 15:07, cbert...@libero.it napsal(a):

Hi all, I have a problem with counters I'd like to solve before going in
production.
I have also similar problem with counters, but i do no think that 
something can be done with it. Developers are not interested in 
discovering what is wrong and i do not have exact steps for re-creating 
problem.


increased RF and repair, not working?

2012-07-26 Thread Yan Chunlu
I am using Cassandra 1.0.2, have a 3 nodes cluster. the consistency level
of read  write are  both QUORUM.

At first the RF=1, and I figured that one node down will cause the cluster
unusable. so I changed RF to 2, and run nodetool repair on every
node(actually I did it twice).

After the operation I think my data should be in at least two nodes, and it
would be okay if one of them is down.

But when I tried to simulate the failure, by disablegossip of one node, and
the cluster knows this node is down. then access data from the cluster, it
returned  MaximumRetryException(pycassa).   as my experiences this is
caused by UnavailableException, which is means the data it is requesting
is on a node which is down.

so I wonder my data might not be replicated right, what should I do? thanks
for the help!

here is the keyspace info:

*
*
*Keyspace: comments:*
*  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy*
*  Durable Writes: true*
*Options: [replication_factor:2]*



the scheme version is okay:

*[default@unknown] describe cluster;*
*Cluster Information:*
*   Snitch: org.apache.cassandra.locator.SimpleSnitch*
*   Partitioner: org.apache.cassandra.dht.RandomPartitioner*
*   Schema versions: *
* f67d0d50-b923-11e1--4f7cf9240aef: [192.168.1.129, 192.168.1.40,
192.168.1.50]*



the loads are as below:

*nodetool -h localhost ring*
*Address DC  RackStatus State   Load
 OwnsToken   *
*
 113427455640312821154458202477256070484 *
*192.168.1.50datacenter1 rack1   Up Normal  28.77 GB
 33.33%  0   *
*192.168.1.40datacenter1 rack1   Up Normal  26.67 GB
 33.33%  56713727820156410577229101238628035242  *
*192.168.1.129   datacenter1 rack1   Up Normal  33.25 GB
 33.33%  113427455640312821154458202477256070484*


Re: Connection issue in Cassandra

2012-07-26 Thread Adeel Akbar
I used Cassandra 0.8.1 and pycasa 0.2. If I upgrade pycasa, then it have 
compatibility issue. please suggest



Thanks  Regards

*Adeel**Akbar*

On 7/25/2012 10:13 PM, Tyler Hobbs wrote:
That's a pretty old version of pycassa; it was release before 0.7.0 
came out.  I suggest upgrading.


It's possible this was caused by an old bug, but in general, this 
indicates that you have more threads trying to use the ConnectionPool 
concurrently than there are connections.


On Wed, Jul 25, 2012 at 3:30 AM, Adeel Akbar 
adeel.ak...@panasiangroup.com mailto:adeel.ak...@panasiangroup.com 
wrote:


Hi,

I have created 2 node cluster and use with application. My
application unable to connect with database. Please find below logs;


  NoConnectionAvailable at /

ConnectionPool limit of size 2 overflow 2 reached, unable to obtain 
connection after 30 seconds

Request Method: GET
Request URL:http://172.16.100.131/
Django Version: 1.4
Exception Type: NoConnectionAvailable
Exception Value:

ConnectionPool limit of size 2 overflow 2 reached, unable to obtain 
connection after 30 seconds

Exception Location:

/usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg/pycassa/pool.py
in get, line 738
Python Executable:  /usr/local/bin/python
Python Version: 2.6.4
Python Path:

['/var/www/bs_ping',
  '/usr/local/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/amqplib-0.6.1-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/BeautifulSoup-3.1.0.1-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/python_dateutil-1.4.1-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/feedparser-4.1-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/python_twitter-0.6-py2.6.egg',
  
'/usr/local/lib/python2.6/site-packages/simplejson-2.0.9-py2.6-linux-i686.egg',
  '/usr/local/lib/python2.6/site-packages/txAMQP-0.3-py2.6.egg',
  
'/usr/local/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-linux-i686.egg',
  
'/usr/local/lib/python2.6/site-packages/zope.interface-3.5.2-py2.6-linux-i686.egg',
  '/usr/local/lib/python2.6/site-packages/UnicodeUtils-0.3.2-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/pytz-2009p-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/ScriptUtils-0.5.5-py2.6.egg',
  
'/usr/local/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-linux-i686.egg',
  '/usr/local/lib/python2.6/site-packages/python_memcached-1.44-py2.6.egg',
  
'/usr/local/lib/python2.6/site-packages/coverage-3.2b1-py2.6-linux-i686.egg',
  
'/usr/local/lib/python2.6/site-packages/flup-1.0.3.dev_20091027-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/oauth-1.0.1-py2.6.egg',
  
'/usr/local/lib/python2.6/site-packages/pyOpenSSL-0.10-py2.6-linux-i686.egg',
  '/usr/local/lib/python2.6/site-packages/pycassa-1.0.8-py2.6.egg',
  
'/usr/local/lib/python2.6/site-packages/wadofstuff_django_serializers-1.1.0-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/jsonpickle-0.4.0-py2.6.egg',
  
'/usr/local/lib/python2.6/site-packages/django_compressor-1.1.2-py2.6.egg',
  '/usr/local/lib/python2.6/site-packages/django_appconf-0.5-py2.6.egg',
  '/usr/local/lib/python26.zip',
  '/usr/local/lib/python2.6',
  '/usr/local/lib/python2.6/plat-linux2',
  '/usr/local/lib/python2.6/lib-tk',
  '/usr/local/lib/python2.6/lib-old',
  '/usr/local/lib/python2.6/lib-dynload',
  '/usr/local/lib/python2.6/site-packages',
  '/usr/local/lib/python2.6/site-packages/PIL',
  '/var/www/bs_ping/',
  '/var/www']

Server time:Wed, 25 Jul 2012 13:17:33 +0500


-- 



Thanks  Regards

*Adeel**Akbar*




--
Tyler Hobbs
DataStax http://datastax.com/





Re: Schema question : Query to support Find which all of these 500 email ids have been registered

2012-07-26 Thread Roshni Rajagopal
In general I believe wide rows (many cols ) are preferable to skinny rows
(many rows) so that you can get all the information in 1 go,
One can store 2 billion cols in a row.

However, on what basis would you store the 500 email ids in 1 row? What
can be the row key?
For e.g. If the query you want to answer with this column family is 'how
many email addresses are registered in this application?', then
application id can be a row key, and 500 email ids can be stored as
columns. Each other applications would be another row . Since you want to
search by application this may be the best approach.

If your information doesn't fit neatly into the model above, you can go
for 
An email id as a row key, and list of applications as columns.



Reading 500 rows does not seem a big  task - I doubt it would be a
performance issue given cassandra's powers.

On 27/07/12 11:12 AM, Aklin_81 asdk...@gmail.com wrote:

I need to find out what all email ids among a list of 500 ids passed
in a single query, have been registered on my app. (Total registered
email ids may be in millions). What is the best way to store this kind
of data?

Should I store each email id in a separate row ? But then I would have
to read 500 rows at a single time ! Or if I use single row or less no
of rows then they would get too heavy.

Btw Would it be really bad if I read 500 rows at a single time,
they'll be just 1 column rows  never modified once written columns.

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***