JDBC connectivity for Cassandra?

2015-07-06 Thread amit tewari
Hi

Is their any official JDBC connectivity for Datastax Cassandra (or
Cassandra in general).

I am looking to fire insert into cassandra like that I can do in CQL.

Thanks


BATCH consistency level VS. individual statement consistency level

2015-07-06 Thread Brice Dutheil
Hi,

I’m not sure how consistency level is applied on batch statement. I didn’t
found detailed information on datastax.com (1)
http://docs.datastax.com/en/cql/3.0/cql/cql_reference/batch_r.html
regarding that.

   - It is possible to set a CL on individual statements.
   - It is possible to set a CL on the whole batch.
   - If CL is not set then the default CL is used (2)
   http://docs.datastax.com/en/cql/3.0/cql/cql_reference/consistency_r.html
   which is ONE (3)
   
http://docs.datastax.com/en/cassandra/1.2/cassandra/dml/dml_config_consistency_c.html?scroll=concept_ds_umf_5xx_zj__configuring-client-consistency-levels

I understand that if the CL is only set on the batch, then this CL will be
applied to individual statements.

But how are applied consistency levels in these cases ?

   1.

   when consistency levels differ between statements and batch :

BEGIN BATCH USING CONSISTENCY QUORUM
  INSERT INTO users (...) VALUES (...) USING CONSISTENCY LOCAL_QUORUM
  UPDATE users SET password = '...' WHERE userID = '...'  USING
CONSISTENCY LOCAL_ONE
  ...
APPLY BATCH;

   2.

   when consistency level is set on statement but not on batch (will it be
   overridden to default consistency level, i.e. ONE ?)

BEGIN BATCH
  INSERT INTO users (...) VALUES (...) USING CONSISTENCY LOCAL_QUORUM
  UPDATE users SET password = '...' WHERE userID = '...'  USING
CONSISTENCY LOCAL_ONE
  ...
APPLY BATCH;


Cheers,
— Brice
​


Example Data Modelling

2015-07-06 Thread Srinivasa T N
Hi,

   I have basic doubt: I have an RDBMS with the following two tables:

   Emp - EmpID, FN, LN, Phone, Address
   Sal - Month, Empid, Basic, Flexible Allowance

   My use case is to print the Salary slip at the end of each month and the
slip contains emp name and his other details.

   Now, if I want to have the same in cassandra, I will have a single cf
with emp personal details and his salary details.  Is this the right
approach?  Should we have the employee personal details duplicated each
month?

Regards,
Seenu.


Re: Compaction issues, 2.0.12

2015-07-06 Thread Jeff Williams
Thanks Rob.

1) Cassandra version is 2.0.12.

2) Interesting. Looking at JMX org.apache.cassandra.db - ColumnFamilies -
trackcontent - track_content - Attributes, I get:

LiveDiskSpaceUsed: 17788740448, i.e. ~17GB
LiveSSTableCount: 3
TotalDiskSpaceUsed: 55714084629, i.e. ~55GB

So it obviously knows about the extra disk space even though the live
space looks correct. I couldn't find anything to identify the actual files
though.

3) So that was even more interesting. After restarting the cassandra
daemon, the sstables were not deleted and now the same JMX attributes are:

LiveDiskSpaceUsed: 55681040579, i.e. ~55GB
LiveSSTableCount: 8
TotalDiskSpaceUsed: 55681040579, i.e. ~55GB

So some of my non-live tables are back live again, and obviously some of
the big ones!!

Jef


On 6 July 2015 at 20:26, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Jul 6, 2015 at 7:25 AM, Jeff Williams je...@wherethebitsroam.com
 wrote:

 So it appears that the 10 sstables that were compacted
 to 
 /var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-57372
 are still sitting around, with a couple of them being very large!
 Any ideas for what I can check? Can I just delete the old sstables? (I'm
 guessing not).


 1) What version of Cassandra?

 2) There are JMX methods in some versions to identify which files are live
 in the data dir, you could try to use those to see if those files are live.
 Does load seem to show that they are live from the perspective of
 Cassandra?

 3) If they are not live, restarting the Cassandra daemon on the node will
 likely result in them being deleted.

 =Rob




Re: What are problems with schema disagreement

2015-07-06 Thread Robert Coli
On Mon, Jul 6, 2015 at 1:30 PM, John Wong gokoproj...@gmail.com wrote:

 But is there a problem with letting schema disagreement running for a long
 time?


It depends on what the nature of the desynch is, but generally speaking
there may be.

If you added a column or a columnfamily, and one node didn't get that
update, it will except when your clients attempt to read that
column/columnfamily. And so on...

=Rob


Re: Experiencing Timeouts on one node

2015-07-06 Thread Shashi Yachavaram
When we reboot the problematic node, we see the following errors in
system.log.

1. Does this mean hints column family is corrupted?
2. Can we scrub system column family on problematic node and its
replication partners?
3. How do we rebuild System keyspace?

==
ERROR [CompactionExecutor:950] 2015-06-27 20:11:44,595 CassandraDaemon.java
(line 191) Exception in thread Thread[CompactionExecutor:950,1,main]
java.lang.AssertionError: originally calculated column size of 8684 but now
it is 15725
at
org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
at
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
at
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at
org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
ERROR [HintedHandoff:552] 2015-06-27 20:11:44,595 CassandraDaemon.java
(line 191) Exception in thread Thread[HintedHandoff:552,1,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException:
java.lang.AssertionError: originally calculated column size of 8684 but now
it is 15725
at
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:436)
at
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282)
at
org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90)
at
org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:502)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.ExecutionException:
java.lang.AssertionError: originally calculated column size of 8684 but now
it is 15725
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:432)
... 6 more
Caused by: java.lang.AssertionError: originally calculated column size of
8684 but now it is 15725
at
org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
at
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
at
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at
org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
==


On Wed, Jul 1, 2015 at 11:59 AM, Shashi Yachavaram shashi...@gmail.com
wrote:

 We have a 28 node cluster, out of which only one node is experiencing
 timeouts.
 We thought it was the raid, but there are two other nodes on the same raid
 without
 any problem. Also The problem goes away if we reboot the node, and then
 reappears
 after seven  days. The following hinted hand-off timeouts are seen on the
 node
 experiencing the timeouts. Also we did not notice any gossip errors.

 I was wondering if anyone has seen this issue and how they resolved it.

 Cassandra Version: 1.2.15.1
 OS: Linux cm 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST 2014
 x86_64 x86_64 x86_64 GNU/Linux
 java version 1.6.0_85


 
 INFO [HintedHandoff:2] 2015-06-17 

Re: Compaction issues, 2.0.12

2015-07-06 Thread Robert Coli
On Mon, Jul 6, 2015 at 1:46 PM, Jeff Williams je...@wherethebitsroam.com
wrote:

 1) Cassandra version is 2.0.12.



 2) Interesting. Looking at JMX org.apache.cassandra.db - ColumnFamilies
 - trackcontent - track_content - Attributes, I get:

 LiveDiskSpaceUsed: 17788740448, i.e. ~17GB
 LiveSSTableCount: 3
 TotalDiskSpaceUsed: 55714084629, i.e. ~55GB

 So it obviously knows about the extra disk space even though the live
 space looks correct. I couldn't find anything to identify the actual files
 though.


That's what I would expect.


 3) So that was even more interesting. After restarting the cassandra
 daemon, the sstables were not deleted and now the same JMX attributes are:

 LiveDiskSpaceUsed: 55681040579, i.e. ~55GB
 LiveSSTableCount: 8
 TotalDiskSpaceUsed: 55681040579, i.e. ~55GB

 So some of my non-live tables are back live again, and obviously some of
 the big ones!!


This is permanently fatal to consistency; sorry if I was not clear enough
that if they were not live, there was some risk of Cassandra considering
them live again upon restart.

If I were you, I would either stop the node and remove the files you know
shouldn't be live or do a major compaction ASAP.

The behavior you just encountered sounds like a bug, and it is a rather
serious one. SSTables which should be dead being marked live is very bad
for consistency.

Do you see any exceptions in your logs or anything? If you can repro, you
should file a JIRA ticket with the apache project...

=Rob


Re: What are problems with schema disagreement

2015-07-06 Thread John Wong
Thanks. Yeah we typically restart the nodes in the minor version to force
resync.

But is there a problem with letting schema disagreement running for a long
time?

Thanks.

John

On Mon, Jul 6, 2015 at 2:29 PM, Robert Coli rc...@eventbrite.com wrote:



 On Thu, Jul 2, 2015 at 9:31 PM, John Wong gokoproj...@gmail.com wrote:

 Hi Graham. Thanks. We are still running on 1.2.16, but we do plan to
 upgrade in the near future. The load on the cluster at the time was very
 very low. All nodes were responsive, except nothing was show up in the logs
 after certain time, which led me to believe something happened internal,
 although that was a poor wild guess.

 But is it safe to be okay with schema disagreement? I worry about data
 consistency if I let it sit too long.


 In general one shouldn't run with schema disagreement persistently.

 I've seen schema desynch issues on 1.2.x, in general restarting some
 unclear subset of the affected daemons made them synch.

 =Rob




Wrong peers

2015-07-06 Thread nowarry
Hey guys,

I'm using Ruby driver( http://datastax.github.io/ruby-driver/ ) for backup 
scripts. I tried to discover all peers and got wrong peers that are different 
with nodetool status. 

=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns    Host ID                            
   Rack
UN  10.40.231.53  1.18 TB    256     ?       
b2d877d7-f031-4190-8569-976bb0ce034f  RACK01
UN  10.40.231.11  1.24 TB    256     ?       
e15cda1c-65cc-40cb-b85c-c4bd665d02d7  RACK01

cqlsh use system;
cqlsh:system select peer from system.peers;

 peer
--
 10.40.231.31
 10.40.231.53

(2 rows)

What to do with these old peers, whether they can be removed without 
consequences since they are not in production cluster? And how to keep up to 
date the peers?

--
Anton Koshevoy



Re: Wrong peers

2015-07-06 Thread Jeff Williams
Anton,

I have also seen this issue with decommissioned nodes remaining in the
system.peers table.

On the bright side, they can be safely removed from the system.peers table
without issue. You will have to check every node in the cluster since this
is a local setting per node.

Jeff

On 6 July 2015 at 22:45, nowarry nowa...@gmail.com wrote:

 Hey guys,

 I'm using Ruby driver( http://datastax.github.io/ruby-driver/ ) for
 backup scripts. I tried to discover all peers and got wrong peers that are
 different with nodetool status.

 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  OwnsHost ID
 Rack
 UN  10.40.231.53  1.18 TB256 ?
 b2d877d7-f031-4190-8569-976bb0ce034f  RACK01
 UN  10.40.231.11  1.24 TB256 ?
 e15cda1c-65cc-40cb-b85c-c4bd665d02d7  RACK01

 cqlsh use system;
 cqlsh:system select peer from system.peers;

  peer
 --
  10.40.231.31
  10.40.231.53

 (2 rows)

 What to do with these old peers, whether they can be removed without
 consequences since they are not in production cluster? And how to keep up
 to date the peers?

 --
 Anton Koshevoy




Re: Compaction issues, 2.0.12

2015-07-06 Thread Robert Coli
On Mon, Jul 6, 2015 at 7:25 AM, Jeff Williams je...@wherethebitsroam.com
wrote:

 So it appears that the 10 sstables that were compacted
 to 
 /var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-57372
 are still sitting around, with a couple of them being very large!
 Any ideas for what I can check? Can I just delete the old sstables? (I'm
 guessing not).


1) What version of Cassandra?

2) There are JMX methods in some versions to identify which files are live
in the data dir, you could try to use those to see if those files are live.
Does load seem to show that they are live from the perspective of
Cassandra?

3) If they are not live, restarting the Cassandra daemon on the node will
likely result in them being deleted.

=Rob


Re: What are problems with schema disagreement

2015-07-06 Thread Robert Coli
On Thu, Jul 2, 2015 at 9:31 PM, John Wong gokoproj...@gmail.com wrote:

 Hi Graham. Thanks. We are still running on 1.2.16, but we do plan to
 upgrade in the near future. The load on the cluster at the time was very
 very low. All nodes were responsive, except nothing was show up in the logs
 after certain time, which led me to believe something happened internal,
 although that was a poor wild guess.

 But is it safe to be okay with schema disagreement? I worry about data
 consistency if I let it sit too long.


In general one shouldn't run with schema disagreement persistently.

I've seen schema desynch issues on 1.2.x, in general restarting some
unclear subset of the affected daemons made them synch.

=Rob


Question on 'Average tombstones per slice' when running cfstats

2015-07-06 Thread Venkatesh Kandaswamy
Hello,
I cannot find documentation on the last two parameters given by cfstats 
below. It looks like most examples show 0 for these metrics, but our table has 
large numbers. What do these mean?


Read Count: 60601817

Read Latency: 6.107321156707232 ms.

Write Count: 185864222

Write Latency: 0.04878640954362911 ms.

Pending Tasks: 0

Table: product_v1

SSTable count: 18

Space used (live), bytes: 271185720329

Space used (total), bytes: 271202831486

SSTable Compression Ratio: 0.16228696654682392

Number of keys (estimate): 17304960

Memtable cell count: 378

Memtable data size, bytes: 47468777

Memtable switch count: 152949

Local read count: 60601868

Local read latency: 10.945 ms

Local write count: 185864292

Local write latency: 0.089 ms

Pending tasks: 0

Bloom filter false positives: 6430

Bloom filter false ratio: 0.00817

Bloom filter space used, bytes: 50549840

Compacted partition minimum bytes: 25

Compacted partition maximum bytes: 30130992

Compacted partition mean bytes: 107415

Average live cells per slice (last five minutes): 2.0

Average tombstones per slice (last five minutes): 60.0




[cid:1D2F8CCF-E72F-408A-9B3A-C03A1990395E]
Venky Kandaswamy
@WalmartLabs





Re: Question on 'Average tombstones per slice' when running cfstats

2015-07-06 Thread Robert Coli
On Mon, Jul 6, 2015 at 4:19 PM, Venkatesh Kandaswamy ve...@walmartlabs.com
wrote:

 I cannot find documentation on the last two parameters given by
 cfstats below. It looks like most examples show 0 for these metrics, but
 our table has large numbers. What do these mean?
 Average live cells per slice (last five minutes): 2.0
 Average tombstones per slice (last five minutes): 60.0


They mean what they say they mean?

For each slice (which is a weird legacy thriftism you should be able to
parse as request in this context) there was an average of 2 un-masked
(live) cells read and 60 cells which were masked (tombstones).

This is a pretty bad ratio, and suggests that your design does UPDATE-like
operations, which are not being merged fast enough by compaction to avoid
you reading past them.

=Rob


Re: Compaction issues, 2.0.12

2015-07-06 Thread Jeff Ferland
I’ve seen the same thing: https://issues.apache.org/jira/browse/CASSANDRA-9577 
https://issues.apache.org/jira/browse/CASSANDRA-9577

I’ve had cases where a restart clears the old tables, and I’ve had cases where 
a restart considers the old tables to be live.

 On Jul 6, 2015, at 1:51 PM, Robert Coli rc...@eventbrite.com wrote:
 
 On Mon, Jul 6, 2015 at 1:46 PM, Jeff Williams je...@wherethebitsroam.com 
 mailto:je...@wherethebitsroam.com wrote:
 1) Cassandra version is 2.0.12. 
  
 2) Interesting. Looking at JMX org.apache.cassandra.db - ColumnFamilies - 
 trackcontent - track_content - Attributes, I get:
 
 LiveDiskSpaceUsed: 17788740448 tel:17788740448, i.e. ~17GB
 LiveSSTableCount: 3
 TotalDiskSpaceUsed: 55714084629, i.e. ~55GB
 
 So it obviously knows about the extra disk space even though the live space 
 looks correct. I couldn't find anything to identify the actual files though.
 
 That's what I would expect.
  
 3) So that was even more interesting. After restarting the cassandra daemon, 
 the sstables were not deleted and now the same JMX attributes are:
 
 LiveDiskSpaceUsed: 55681040579, i.e. ~55GB
 LiveSSTableCount: 8
 TotalDiskSpaceUsed: 55681040579, i.e. ~55GB
 
 So some of my non-live tables are back live again, and obviously some of the 
 big ones!!
 
 This is permanently fatal to consistency; sorry if I was not clear enough 
 that if they were not live, there was some risk of Cassandra considering them 
 live again upon restart.
 
 If I were you, I would either stop the node and remove the files you know 
 shouldn't be live or do a major compaction ASAP. 
 
 The behavior you just encountered sounds like a bug, and it is a rather 
 serious one. SSTables which should be dead being marked live is very bad for 
 consistency. 
 
 Do you see any exceptions in your logs or anything? If you can repro, you 
 should file a JIRA ticket with the apache project...
 
 =Rob
 



Re: Wrong peers

2015-07-06 Thread Carlos Rolo
There is a bug in Jira related to this, it is not a driver issue, is a
Cassandra issue. It is solved on 2.0.14 I think. I will post the ticket
once I find it.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Jul 6, 2015 at 10:50 PM, Jeff Williams je...@wherethebitsroam.com
wrote:

 Anton,

 I have also seen this issue with decommissioned nodes remaining in the
 system.peers table.

 On the bright side, they can be safely removed from the system.peers table
 without issue. You will have to check every node in the cluster since this
 is a local setting per node.

 Jeff

 On 6 July 2015 at 22:45, nowarry nowa...@gmail.com wrote:

 Hey guys,

 I'm using Ruby driver( http://datastax.github.io/ruby-driver/ ) for
 backup scripts. I tried to discover all peers and got wrong peers that are
 different with nodetool status.

 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  OwnsHost ID
   Rack
 UN  10.40.231.53  1.18 TB256 ?
 b2d877d7-f031-4190-8569-976bb0ce034f  RACK01
 UN  10.40.231.11  1.24 TB256 ?
 e15cda1c-65cc-40cb-b85c-c4bd665d02d7  RACK01

 cqlsh use system;
 cqlsh:system select peer from system.peers;

  peer
 --
  10.40.231.31
  10.40.231.53

 (2 rows)

 What to do with these old peers, whether they can be removed without
 consequences since they are not in production cluster? And how to keep up
 to date the peers?

 --
 Anton Koshevoy




-- 


--





Re: Question on 'Average tombstones per slice' when running cfstats

2015-07-06 Thread Jonathan Haddad
Wouldn't it suggest a delete heavy workload, rather than update?

On Mon, Jul 6, 2015 at 5:21 PM Robert Coli rc...@eventbrite.com wrote:

 On Mon, Jul 6, 2015 at 4:19 PM, Venkatesh Kandaswamy 
 ve...@walmartlabs.com wrote:

 I cannot find documentation on the last two parameters given by
 cfstats below. It looks like most examples show 0 for these metrics, but
 our table has large numbers. What do these mean?

 Average live cells per slice (last five minutes): 2.0
 Average tombstones per slice (last five minutes): 60.0


 They mean what they say they mean?

 For each slice (which is a weird legacy thriftism you should be able to
 parse as request in this context) there was an average of 2 un-masked
 (live) cells read and 60 cells which were masked (tombstones).

 This is a pretty bad ratio, and suggests that your design does UPDATE-like
 operations, which are not being merged fast enough by compaction to avoid
 you reading past them.

 =Rob





Compaction issues, 2.0.12

2015-07-06 Thread Jeff Williams
Hi,

I have a keyspace which is using about 19GB on most nodes in the cluster.
However on one node the data directory for the only table in that keyspace
now uses 52GB. It seems that old sstables are not removed after compaction.

Here is a view of the data directory Data files before compaction (I didn't
think to check sizes at the time):

$ ls /var/lib/cassandra/data/trackcontent/track_content/*Data.db
trackcontent-track_content-jb-30852-Data.db
trackcontent-track_content-jb-30866-Data.db
trackcontent-track_content-jb-37136-Data.db
trackcontent-track_content-jb-43371-Data.db
trackcontent-track_content-jb-44676-Data.db
trackcontent-track_content-jb-51845-Data.db
trackcontent-track_content-jb-55339-Data.db
trackcontent-track_content-jb-57353-Data.db
trackcontent-track_content-jb-57366-Data.db
trackcontent-track_content-jb-57367-Data.db
trackcontent-track_content-jb-57368-Data.db

Then when compaction was run I got the following in the logs:

 INFO [CompactionExecutor:3546] 2015-07-06 12:01:04,246 CompactionTask.java
(line 120) Compacting [
SSTableReader(path='/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-30852-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-30866-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-37136-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-43371-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-44676-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-51845-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-55339-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-57353-Data.db')
SSTableReader(path='/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-57366-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-57371-Data.db'),
]

INFO [CompactionExecutor:3546] 2015-07-06 13:34:31,458 CompactionTask.java
(line 296) Compacted 10 sstables to
[/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-57372,].
 36,182,766,725 bytes to 16,916,868,412 (~46% of original) in 5,607,211ms =
2.877221MB/s.  84,600,535 total partitions merged to 40,835,912.  Partition
merge counts were {1:864528, 2:36318841, 3:3523852, 4:124091, 5:6051, 6:25,
}

But after compaction completes, the data directory still looks like:

$ ls -l /var/lib/cassandra/data/trackcontent/track_content/*Data.db
-rw-r--r-- 1 cassandra cassandra 16642164891 Jun 29 06:55
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-30852-Data.db
-rw-r--r-- 1 cassandra cassandra 17216513377 Jun 30 08:36
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-30866-Data.db
-rw-r--r-- 1 cassandra cassandra   813683923 Jun 30 12:03
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-37136-Data.db
-rw-r--r-- 1 cassandra cassandra   855070477 Jun 30 13:15
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-43371-Data.db
-rw-r--r-- 1 cassandra cassandra   209921645 Jun 30 13:28
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-44676-Data.db
-rw-r--r-- 1 cassandra cassandra   213532360 Jul  2 03:16
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-51845-Data.db
-rw-r--r-- 1 cassandra cassandra14763933 Jul  2 07:58
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-52921-Data.db
-rw-r--r-- 1 cassandra cassandra16955005 Jul  2 08:09
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-53037-Data.db
-rw-r--r-- 1 cassandra cassandra61322156 Jul  2 23:34
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-55339-Data.db
-rw-r--r-- 1 cassandra cassandra61898096 Jul  4 01:02
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-57353-Data.db
-rw-r--r-- 1 cassandra cassandra75912668 Jul  5 21:23
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-57366-Data.db
-rw-r--r-- 1 cassandra cassandra32747132 Jul  6 11:01
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-57371-Data.db
-rw-r--r-- 1 cassandra cassandra 16916868412 Jul  6 13:34
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-57372-Data.db
-rw-r--r-- 1 cassandra cassandra 8392714 Jul  6 13:24
/var/lib/cassandra/data/trackcontent/track_content/trackcontent-track_content-jb-57373-Data.db


So it appears that the 10 sstables that were 

Re: Example Data Modelling

2015-07-06 Thread Carlos Alonso
Hi Srinivasa,

I think you're right, In Cassandra you should favor denormalisation when in
RDBMS you find a relationship like this.

I'd suggest a cf like this
CREATE TABLE salaries (
  EmpID varchar,
  FN varchar,
  LN varchar,
  Phone varchar,
  Address varchar,
  month integer,
  basic integer,
  flexible_allowance float,
  PRIMARY KEY(EmpID, month)
)

That way the salaries will be partitioned by EmpID and clustered by month,
which I guess is the natural sorting you want.

Hope it helps,
Cheers!

Carlos Alonso | Software Engineer | @calonso https://twitter.com/calonso

On 6 July 2015 at 13:01, Srinivasa T N seen...@gmail.com wrote:

 Hi,

I have basic doubt: I have an RDBMS with the following two tables:

Emp - EmpID, FN, LN, Phone, Address
Sal - Month, Empid, Basic, Flexible Allowance

My use case is to print the Salary slip at the end of each month and
 the slip contains emp name and his other details.

Now, if I want to have the same in cassandra, I will have a single cf
 with emp personal details and his salary details.  Is this the right
 approach?  Should we have the employee personal details duplicated each
 month?

 Regards,
 Seenu.



Re: Bootstrap code

2015-07-06 Thread Sebastian Estevez
For 2.1-

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/dht/BootStrapper.java#L61

All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jul 3, 2015 at 10:56 AM, Bill Hastings bllhasti...@gmail.com
wrote:

 Hi All

 Can someone please point me to where the code for bootstrapping a new node
 exists?

 --
 Cheers
 Bill