Issue with leveled compaction and data migration

2013-09-13 Thread Michael Theroux
Hello,

We've been undergoing a migration on Cassandra 1.1.9 where we are combining two 
column families.  We are incrementally moving data from one column family into 
another, where the columns in a row in the source column family are being 
appended to columns in a row in the target column family.  Both column families 
are using leveled compaction, and both column families have over 100 million 
rows.  

However, our bloom filters on the target column family grow dramatically (less 
than double) after converting less than 1/4 of the data.  I assume this is 
because new changes are not being compacted with older changes, although I 
thought leveled compaction would mitigate this for me. Any advice on what we 
can do to control our bloom filter growth during this migration?

Appreciate the help,
Thanks,
-Mike

Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-13 Thread Keith Freeman
Paul-  Sorry to go off-list but I'm diving pretty far into details 
here.  Ignore if you wish.


Thanks a lot for the example, definitely very helpful.  I'm surprised 
that the Cassandra experts aren't more interested-in/alarmed-by our 
results, it seems like we've proved that insert performance for wide 
rows in CQL is enormously worse than it was before CQL.  And I have a 
feeling 2.0 won't help much -- I'm already using entirely-prepared batches.


To reproduce your example, I switched to cassandra 1.2.6  and astyanax 
1.56.42.  But anything I try to do with that version combination gives 
me an exception on the client side (e.g. execute() on a query):
13-09-13 15:42:42.511 [pool-6-thread-1] ERROR 
c.n.a.t.ThriftSyncConnectionFactoryImpl - Error creating connection
java.lang.NoSuchMethodError: 
org.apache.cassandra.thrift.TBinaryProtocol: method 
init(Lorg/apache/thrift/transport/TTransport;)V not found
at 
com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.open(ThriftSyncConnectionFactoryImpl.java:195) 
~[astyanax-thrift-1.56.37.jar:na]
at 
com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection$1.run(ThriftSyncConnectionFactoryImpl.java:232) 
[astyanax-thrift-1.56.37.jar:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_07]
From my googling this is due to a cassandra API change in 
TBinaryProtocol, which is why I had to use cassandra 1.2.5 jars to get 
my astyanax client to work at all in my earlier experiments. Did you 
encounter this?  Also, you had 1.2.8 in the stackoverflow post, but 
1.2.6 in this email, did you have to rollback?


Thanks for any help you can offer, hope I can return the favor at some 
point.



On 09/12/2013 02:26 PM, Paul Cichonski wrote:

I'm running Cassandra 1.2.6 without compact storage on my tables. The trick is 
making your Astyanax (I'm running 1.56.42) mutation work with the CQL table 
definition (this is definitely a bit of a hack since most of the advice says 
don't mix the CQL and Thrift APIs so it is your call on how far you want to 
go). If you want to still try and test it out you need to leverage the Astyanax 
CompositeColumn construct to make it work 
(https://github.com/Netflix/astyanax/wiki/Composite-columns)

I've provided a slightly modified version of what I am doing below:

CQL table def:

CREATE TABLE standard_subscription_index
(
subscription_type text,
subscription_target_id text,
entitytype text,
entityid int,
creationtimestamp timestamp,
indexed_tenant_id uuid,
deleted boolean,
 PRIMARY KEY ((subscription_type, subscription_target_id), entitytype, 
entityid)
)

ColumnFamily definition:

private static final ColumnFamilySubscriptionIndexCompositeKey, 
SubscribingEntityCompositeColumn COMPOSITE_ROW_COLUMN = new 
ColumnFamilySubscriptionIndexCompositeKey, SubscribingEntityCompositeColumn(
SUBSCRIPTION_CF_NAME, new 
AnnotatedCompositeSerializerSubscriptionIndexCompositeKey(SubscriptionIndexCompositeKey.class),
new 
AnnotatedCompositeSerializerSubscribingEntityCompositeColumn(SubscribingEntityCompositeColumn.class));


SubscriptionIndexCompositeKey is a class that contains the fields from the row key (e.g., 
subscription_type, subscription_target_id), and SubscribingEntityCompositeColumn contains 
the fields from the composite column (as it would look if you view your data using 
Cassandra-cli), so: entityType, entityId, columnName. The columnName field is the tricky 
part as it defines what to interpret the column value as (i.e., if it is a value for the 
creationtimestamp the column might be someEntityType:4:creationtimestamp

The actual mutation looks something like this:

final MutationBatch mutation = getKeyspace().prepareMutationBatch();
final ColumnListMutationSubscribingEntityCompositeColumn row = 
mutation.withRow(COMPOSITE_ROW_COLUMN,
new SubscriptionIndexCompositeKey(targetEntityType.getName(), 
targetEntityId));

for (Subscription sub : subs) {
row.putColumn(new 
SubscribingEntityCompositeColumn(sub.getEntityType().getName(), 
sub.getEntityId(),
creationtimestamp), 
sub.getCreationTimestamp());
row.putColumn(new 
SubscribingEntityCompositeColumn(sub.getEntityType().getName(), 
sub.getEntityId(),
deleted), sub.isDeleted());
row.putColumn(new 
SubscribingEntityCompositeColumn(sub.getEntityType().getName(), 
sub.getEntityId(),
indexed_tenant_id), tenantId);
}

Hope that helps,
Paul



Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-13 Thread Nate McCall
https://github.com/Netflix/astyanax/issues/391

I've gotten in touch with a couple of netflix folks and they are going to
try to roll a release shortly.

You should be able to build against 1.2.2 and 'talking' to 1.2.9 instance
should work. Just a PITA development wise to maintain a different
version(s).


On Fri, Sep 13, 2013 at 10:52 AM, Keith Freeman 8fo...@gmail.com wrote:

 Paul-  Sorry to go off-list but I'm diving pretty far into details here.
  Ignore if you wish.

 Thanks a lot for the example, definitely very helpful.  I'm surprised that
 the Cassandra experts aren't more interested-in/alarmed-by our results, it
 seems like we've proved that insert performance for wide rows in CQL is
 enormously worse than it was before CQL.  And I have a feeling 2.0 won't
 help much -- I'm already using entirely-prepared batches.

 To reproduce your example, I switched to cassandra 1.2.6  and astyanax
 1.56.42.  But anything I try to do with that version combination gives me
 an exception on the client side (e.g. execute() on a query):

 13-09-13 15:42:42.511 [pool-6-thread-1] ERROR c.n.a.t.**
 ThriftSyncConnectionFactoryImp**l - Error creating connection
 java.lang.NoSuchMethodError: org.apache.cassandra.thrift.**TBinaryProtocol:
 method init(Lorg/apache/thrift/**transport/TTransport;)V not found
 at com.netflix.astyanax.thrift.**ThriftSyncConnectionFactoryImp**
 l$ThriftConnection.open(**ThriftSyncConnectionFactoryImp**l.java:195)
 ~[astyanax-thrift-1.56.37.jar:**na]
 at com.netflix.astyanax.thrift.**ThriftSyncConnectionFactoryImp**
 l$ThriftConnection$1.run(**ThriftSyncConnectionFactoryImp**l.java:232)
 [astyanax-thrift-1.56.37.jar:**na]
 at 
 java.util.concurrent.**Executors$RunnableAdapter.**call(Executors.java:471)
 [na:1.7.0_07]

 From my googling this is due to a cassandra API change in TBinaryProtocol,
 which is why I had to use cassandra 1.2.5 jars to get my astyanax client to
 work at all in my earlier experiments. Did you encounter this?  Also, you
 had 1.2.8 in the stackoverflow post, but 1.2.6 in this email, did you have
 to rollback?

 Thanks for any help you can offer, hope I can return the favor at some
 point.



 On 09/12/2013 02:26 PM, Paul Cichonski wrote:

 I'm running Cassandra 1.2.6 without compact storage on my tables. The
 trick is making your Astyanax (I'm running 1.56.42) mutation work with the
 CQL table definition (this is definitely a bit of a hack since most of the
 advice says don't mix the CQL and Thrift APIs so it is your call on how far
 you want to go). If you want to still try and test it out you need to
 leverage the Astyanax CompositeColumn construct to make it work (
 https://github.com/Netflix/**astyanax/wiki/Composite-**columnshttps://github.com/Netflix/astyanax/wiki/Composite-columns
 )

 I've provided a slightly modified version of what I am doing below:

 CQL table def:

 CREATE TABLE standard_subscription_index
 (
 subscription_type text,
 subscription_target_id text,
 entitytype text,
 entityid int,
 creationtimestamp timestamp,
 indexed_tenant_id uuid,
 deleted boolean,
  PRIMARY KEY ((subscription_type, subscription_target_id),
 entitytype, entityid)
 )

 ColumnFamily definition:

 private static final ColumnFamily**SubscriptionIndexCompositeKey,
 SubscribingEntityCompositeColu**mn COMPOSITE_ROW_COLUMN = new
 ColumnFamily**SubscriptionIndexCompositeKey,
 SubscribingEntityCompositeColu**mn(
 SUBSCRIPTION_CF_NAME, new AnnotatedCompositeSerializer**
 SubscriptionIndexCompositeKey**(**SubscriptionIndexCompositeKey.**
 class),
 new AnnotatedCompositeSerializer**SubscribingEntityCompositeColu
 **mn(**SubscribingEntityCompositeColu**mn.class));


 SubscriptionIndexCompositeKey is a class that contains the fields from
 the row key (e.g., subscription_type, subscription_target_id), and
 SubscribingEntityCompositeColu**mn contains the fields from the
 composite column (as it would look if you view your data using
 Cassandra-cli), so: entityType, entityId, columnName. The columnName field
 is the tricky part as it defines what to interpret the column value as
 (i.e., if it is a value for the creationtimestamp the column might be
 someEntityType:4:**creationtimestamp

 The actual mutation looks something like this:

 final MutationBatch mutation = getKeyspace().**prepareMutationBatch();
 final ColumnListMutation**SubscribingEntityCompositeColu**mn row =
 mutation.withRow(COMPOSITE_**ROW_COLUMN,
 new 
 SubscriptionIndexCompositeKey(**targetEntityType.getName(),
 targetEntityId));

 for (Subscription sub : subs) {
 row.putColumn(new SubscribingEntityCompositeColu**
 mn(sub.getEntityType().**getName(), sub.getEntityId(),
 creationtimestamp),
 sub.getCreationTimestamp());
 row.putColumn(new SubscribingEntityCompositeColu**
 mn(sub.getEntityType().**getName(), sub.getEntityId(),
 deleted), 

Re: Normal OS: Disk Throughput levels for EC2

2013-09-13 Thread David Ward
My apologies, information that should have been in my original email.

m1.xlarges using a single raid0 ephemeral array for both data and the
commit log.

Latest burst write was ~150GB over 3 nodes ( rf 3 so 150GB per node ) with
8GB heap but no major spikes show up on the Opscenter graph for throughput
writes.


On Fri, Sep 13, 2013 at 7:45 AM, Nate McCall n...@thelastpickle.com wrote:

 This can vary pretty heavily by instance type and storage options. What
 size instances are these and how is the storage configured?


 On Fri, Sep 13, 2013 at 1:11 AM, David Ward da...@shareablee.com wrote:

 I noticed on EC2, the c* nodes according to OpsCenter have never gone
 above 1.6-2.2MBps.  That seems abnormally low but I have no reference as to
 what is normal for cassandra on EC2 and curious what other people have
 seen according to OpsCenter for the OS: Disk Throughput metric.

 Thanks,
Dave





Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-13 Thread Nate McCall
Also, I was working on this a bit for a client so compiled my notes and
approach into a blog post for posterity (and so it's easier to find for
others):
http://thelastpickle.com/blog/2013/09/13/CQL3-to-Astyanax-Compatibility.html

Paul's method on this thread is cited at the bottom as well.


On Fri, Sep 13, 2013 at 11:16 AM, Nate McCall n...@thelastpickle.comwrote:

 https://github.com/Netflix/astyanax/issues/391

 I've gotten in touch with a couple of netflix folks and they are going to
 try to roll a release shortly.

 You should be able to build against 1.2.2 and 'talking' to 1.2.9 instance
 should work. Just a PITA development wise to maintain a different
 version(s).


 On Fri, Sep 13, 2013 at 10:52 AM, Keith Freeman 8fo...@gmail.com wrote:

 Paul-  Sorry to go off-list but I'm diving pretty far into details here.
  Ignore if you wish.

 Thanks a lot for the example, definitely very helpful.  I'm surprised
 that the Cassandra experts aren't more interested-in/alarmed-by our
 results, it seems like we've proved that insert performance for wide rows
 in CQL is enormously worse than it was before CQL.  And I have a feeling
 2.0 won't help much -- I'm already using entirely-prepared batches.

 To reproduce your example, I switched to cassandra 1.2.6  and astyanax
 1.56.42.  But anything I try to do with that version combination gives me
 an exception on the client side (e.g. execute() on a query):

 13-09-13 15:42:42.511 [pool-6-thread-1] ERROR c.n.a.t.**
 ThriftSyncConnectionFactoryImp**l - Error creating connection
 java.lang.NoSuchMethodError: org.apache.cassandra.thrift.**TBinaryProtocol:
 method init(Lorg/apache/thrift/**transport/TTransport;)V not found
 at com.netflix.astyanax.thrift.**ThriftSyncConnectionFactoryImp**
 l$ThriftConnection.open(**ThriftSyncConnectionFactoryImp**l.java:195)
 ~[astyanax-thrift-1.56.37.jar:**na]
 at com.netflix.astyanax.thrift.**ThriftSyncConnectionFactoryImp**
 l$ThriftConnection$1.run(**ThriftSyncConnectionFactoryImp**l.java:232)
 [astyanax-thrift-1.56.37.jar:**na]
 at 
 java.util.concurrent.**Executors$RunnableAdapter.**call(Executors.java:471)
 [na:1.7.0_07]

 From my googling this is due to a cassandra API change in
 TBinaryProtocol, which is why I had to use cassandra 1.2.5 jars to get my
 astyanax client to work at all in my earlier experiments. Did you encounter
 this?  Also, you had 1.2.8 in the stackoverflow post, but 1.2.6 in this
 email, did you have to rollback?

 Thanks for any help you can offer, hope I can return the favor at some
 point.



 On 09/12/2013 02:26 PM, Paul Cichonski wrote:

 I'm running Cassandra 1.2.6 without compact storage on my tables. The
 trick is making your Astyanax (I'm running 1.56.42) mutation work with the
 CQL table definition (this is definitely a bit of a hack since most of the
 advice says don't mix the CQL and Thrift APIs so it is your call on how far
 you want to go). If you want to still try and test it out you need to
 leverage the Astyanax CompositeColumn construct to make it work (
 https://github.com/Netflix/**astyanax/wiki/Composite-**columnshttps://github.com/Netflix/astyanax/wiki/Composite-columns
 )

 I've provided a slightly modified version of what I am doing below:

 CQL table def:

 CREATE TABLE standard_subscription_index
 (
 subscription_type text,
 subscription_target_id text,
 entitytype text,
 entityid int,
 creationtimestamp timestamp,
 indexed_tenant_id uuid,
 deleted boolean,
  PRIMARY KEY ((subscription_type, subscription_target_id),
 entitytype, entityid)
 )

 ColumnFamily definition:

 private static final ColumnFamily**SubscriptionIndexCompositeKey,
 SubscribingEntityCompositeColu**mn COMPOSITE_ROW_COLUMN = new
 ColumnFamily**SubscriptionIndexCompositeKey,
 SubscribingEntityCompositeColu**mn(
 SUBSCRIPTION_CF_NAME, new AnnotatedCompositeSerializer**
 SubscriptionIndexCompositeKey**(**SubscriptionIndexCompositeKey.**
 class),
 new AnnotatedCompositeSerializer**
 SubscribingEntityCompositeColu**mn(**SubscribingEntityCompositeColu**
 mn.class));


 SubscriptionIndexCompositeKey is a class that contains the fields from
 the row key (e.g., subscription_type, subscription_target_id), and
 SubscribingEntityCompositeColu**mn contains the fields from the
 composite column (as it would look if you view your data using
 Cassandra-cli), so: entityType, entityId, columnName. The columnName field
 is the tricky part as it defines what to interpret the column value as
 (i.e., if it is a value for the creationtimestamp the column might be
 someEntityType:4:**creationtimestamp

 The actual mutation looks something like this:

 final MutationBatch mutation = getKeyspace().**prepareMutationBatch();
 final ColumnListMutation**SubscribingEntityCompositeColu**mn row =
 mutation.withRow(COMPOSITE_**ROW_COLUMN,
 new 
 SubscriptionIndexCompositeKey(**targetEntityType.getName(),
 targetEntityId));

 for (Subscription 

is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Hiller, Dean
I was just wondering if cassandra had any special CF that every row exists on 
every node for smaller tables that we would want to leverage in map/reduce.  
The table row count is less than 500k and we are ok with slow updates to the 
table, but this would make M/R blazingly fast since for every row, we read into 
this table.

Thanks,
Dean


Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Jon Haddad
It sounds some something that's only useful in a really limited use case.  In 
an 11 node cluster it would be quorum reads / writes would need to come from 6 
nodes.  It would probably be much slower for both reads  writes. 

It sounds like what you want is a database with replication, not partitioning.

On Sep 13, 2013, at 11:15 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 When I add nodes though, I would kind of be screwed there, right?  Is there 
 an RF=${nodecount}…that would be neat.
 
 Dean
 
 From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Friday, September 13, 2013 12:06 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: is there any type of table existing on all nodes(slow to up 
 date, fast to read in map/reduce)?
 
 On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean 
 dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
 I was just wondering if cassandra had any special CF that every row exists on 
 every node for smaller tables that we would want to leverage in map/reduce.  
 The table row count is less than 500k and we are ok with slow updates to the 
 table, but this would make M/R blazingly fast since for every row, we read 
 into this table.
 
 Create a keyspace with replication configured such that RF=N?
 
 =Rob



Re: Normal OS: Disk Throughput levels for EC2

2013-09-13 Thread Nate McCall
This can vary pretty heavily by instance type and storage options. What
size instances are these and how is the storage configured?


On Fri, Sep 13, 2013 at 1:11 AM, David Ward da...@shareablee.com wrote:

 I noticed on EC2, the c* nodes according to OpsCenter have never gone
 above 1.6-2.2MBps.  That seems abnormally low but I have no reference as to
 what is normal for cassandra on EC2 and curious what other people have
 seen according to OpsCenter for the OS: Disk Throughput metric.

 Thanks,
Dave



Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Robert Coli
On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 I was just wondering if cassandra had any special CF that every row exists
 on every node for smaller tables that we would want to leverage in
 map/reduce.  The table row count is less than 500k and we are ok with slow
 updates to the table, but this would make M/R blazingly fast since for
 every row, we read into this table.


Create a keyspace with replication configured such that RF=N?

=Rob


Re: Normal OS: Disk Throughput levels for EC2

2013-09-13 Thread Alain RODRIGUEZ
You should give further information if you want an answer. What kind of
instance it is ? Instance sore / EBS / Optimized EBS ? Do you try to read /
write on this disk ? How much ? ...

With m1 xlarge we reached 40 MBps and now with a hi1.4xlarge we don't have
reach any limit yet, and we have 100+ MBps.

My guess is that either you are writing (so no disk ops, because of
memtables) or your client process data too slowly to use correctly your
hardware.

Alain


2013/9/13 David Ward da...@shareablee.com

 I noticed on EC2, the c* nodes according to OpsCenter have never gone
 above 1.6-2.2MBps.  That seems abnormally low but I have no reference as to
 what is normal for cassandra on EC2 and curious what other people have
 seen according to OpsCenter for the OS: Disk Throughput metric.

 Thanks,
Dave



Nodes separating from the ring

2013-09-13 Thread Dave Cowen
Hi, all -

We've been running Cassandra 1.1.12 in production since February, and have
experienced a vexing problem with an arbitrary node falling out of or
separating from the ring on occasion.

When a node falls out of the ring, running nodetool ring on the
misbehaving node shows that the misbehaving node believes that  is Up, but
that the rest of the ring is Down, and the rest of the ring has question
marks listed for load. nodetool ring on any of the other nodes, however,
shows the misbehaving node as Down but everything else is up.

Shutting down and restarting the misbehaving node does not result in
changed behavior. We can only get the misbehaving node to rejoin the ring
by shutting it down, running nodetool removetoken misbehaving node token
and nodetool removetoken force elsewhere in the ring. After the node's
token has been removed from the ring, it will rejoin and behave normally
when it is restarted.

This is not a frequent occurrence - we can go months between this
happening. It most commonly occurs when a different node is brought down
and then back up, but it can happen spontaneously. This is also not
associated with a network connectivity event; we've seen no interruption in
the nodes being able to communicate over the network. As above, it's also
not isolated to a single node; we've seen this behavior on multiple nodes.

This has occurred with both the identical seeds specified in cassandra.yaml
on each node, and also when we remove the node from its own seed list (so
any seed won't try to auto-bootstrap from itself). Seeds have always been
up and available.

Has anyone else seen similar behavior? For obvious reasons, we hate seeing
one of the nodes suddenly fall out and require intervention when we flap
another node, or for no reason at all.

Thanks,

Dave


Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Hiller, Dean
That's an interesting idea…..so that would be an RF=1 in each data 
center…..very interesting.

Dean

From: Jonathan Haddad j...@jonhaddad.commailto:j...@jonhaddad.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, September 13, 2013 1:50 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: is there any type of table existing on all nodes(slow to up date, 
fast to read in map/reduce)?

You could create a bunch of 1 node DCs if you really wanted it.


On Fri, Sep 13, 2013 at 12:29 PM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
Actually, I have been on a few projects where something like that is
useful.  Gemfire(a grid memory cache) had that feature which we used at
another company.  On every project I encounter, there is usually one small
table somewhereŠ.either meta data or something that is infrequently
changing and nice to duplicate on every node.  I bet eventually nosql
stores may start to add it maybe in a few years, but I guess we are not
there yet.

Thanks,
Dean

On 9/13/13 12:24 PM, Jon Haddad 
j...@jonhaddad.commailto:j...@jonhaddad.com wrote:

It sounds some something that's only useful in a really limited use case.
 In an 11 node cluster it would be quorum reads / writes would need to
come from 6 nodes.  It would probably be much slower for both reads 
writes.

It sounds like what you want is a database with replication, not
partitioning.

On Sep 13, 2013, at 11:15 AM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:

 When I add nodes though, I would kind of be screwed there, right?  Is
there an RF=${nodecount}Šthat would be neat.

 Dean

 From: Robert Coli 
 rc...@eventbrite.commailto:rc...@eventbrite.commailto:rc...@eventbrite.commailto:rc...@eventbrite.com
 Reply-To: 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Friday, September 13, 2013 12:06 PM
 To: 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: is there any type of table existing on all nodes(slow to
up date, fast to read in map/reduce)?

 On Fri, Sep 13, 2013 at 10:47 AM, Hiller, Dean
dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.gov
 wrote:
 I was just wondering if cassandra had any special CF that every row
exists on every node for smaller tables that we would want to leverage
in map/reduce.  The table row count is less than 500k and we are ok with
slow updates to the table, but this would make M/R blazingly fast since
for every row, we read into this table.

 Create a keyspace with replication configured such that RF=N?

 =Rob





--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Robert Coli
On Fri, Sep 13, 2013 at 11:15 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 When I add nodes though, I would kind of be screwed there, right?  Is
 there an RF=${nodecount}…that would be neat.


Increasing replication factor is well understood, and in this case you
could pre-load the entire dataset onto the new node instead of having it
bootstrap.

But the DC-per-node idea is.. kinda interesting..

=Rob