RE: [Cassandra] Initial Setup - VMs for Research

2013-09-25 Thread Kanwar Sangha
What help are u looking for ? 

http://www.datastax.com/docs/datastax_enterprise3.1/install/install_deb_pkg

-Original Message-
From: shath...@e-z.net [mailto:shath...@e-z.net] 
Sent: 25 September 2013 15:27
To: user@cassandra.apache.org
Subject: [Cassandra] Initial Setup - VMs for Research

Request some initial setup guidance for Cassandra deployment

I expect to mentor a project at the Oregon State University
computer science department for a senior engineering student
project.

I am trying to pre-configure one or more VMware virtual
machines to hold an initial Cassandra database for a NOSQL
project.

Any guidance on the steps for initial deployment would
be appreciated.

My VMware machines already have the necessary 3rd party
tools such as Oracle Java 7 and are running on a Debian Linux
7.1.0 release.  The Oregon State University computer science
department will eventually host these virtual machines on
their department servers if the student project is selected.

Sincerely,
Steven J. Hathaway
(Senior IT Systems Architect)



nodetool tpstats

2013-09-18 Thread Kanwar Sangha
Hi - During a write heavy load, the tpstats show the following -

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
BINARY   0
READ 0
MUTATION 65570
_TRACE   0
REQUEST_RESPONSE 10929


Mutation means that the inter-node messages are dropped as a throttling 
mechanism..what does the request_response signify ? That the node accepted the 
message but was not able to process it in the timeout ?

Thanks,
Kanwar





RE: Secondary Index Question

2013-08-21 Thread Kanwar Sangha
Thanks Dean. Any reason why it is sequential ? It is to avoid loading all the 
nodes and see if one node can return the desired results ?


-Original Message-
From: Hiller, Dean [mailto:dean.hil...@nrel.gov] 
Sent: 21 August 2013 07:36
To: user@cassandra.apache.org
Subject: Re: Secondary Index Question

Yup, there are other types of indexing like that in PlayOrm which do it 
differently so all nodes are not hit so it works better for instance if you are 
partitioning your data and you query into just a single partition so it doesn't 
put load on all the nodes.  (of course, you have to have a partition strategy 
to partition by say month with key being the timestamp of begin of month or 
maybe you partition by account as you only query into accounts).

It is feasible to roll your own as well.  (though you do need to worry about 
eventual consistency here when rolling your own)

Later,
Dean

From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, August 20, 2013 6:57 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Secondary Index Question

Hi - I was reading some blogs on implementation of secondary indexes in 
Cassandra and they say that "the read requests are sent sequentially to all the 
nodes" ?

So if I have a query to fetch ALL records with the secondary index filter, will 
the co-ordinator node send the requests to nodes one by one ?

Thanks,
Kanwar



Secondary Index Question

2013-08-20 Thread Kanwar Sangha
Hi - I was reading some blogs on implementation of secondary indexes in 
Cassandra and they say that "the read requests are sent sequentially to all the 
nodes" ?

So if I have a query to fetch ALL records with the secondary index filter, will 
the co-ordinator node send the requests to nodes one by one ?

Thanks,
Kanwar



RE: Cassandra HANGS after some writes

2013-08-13 Thread Kanwar Sangha
Cassandra on windows ? Please install Linux !


From: Romain HARDOUIN [mailto:romain.hardo...@urssaf.fr]
Sent: 13 August 2013 10:17
To: user@cassandra.apache.org
Subject: Re: Cassandra HANGS after some writes

Naresh,

My two cents is that you should run Cassandra on a Linux VM.
Issues are more easy to diagnose/pinpoint. Windows is a bit obscure to many 
people here.

Cheers

Alexis Rodríguez 
mailto:arodrig...@inconcertcc.com>> a écrit sur 
13/08/2013 16:50:42 :

> De : Alexis Rodríguez 
> mailto:arodrig...@inconcertcc.com>>
> A : user@cassandra.apache.org,
> Date : 13/08/2013 16:51
> Objet : Re: Cassandra HANGS after some writes
>
> Naresh,
>
> Windows is not my cup of tea. May be someone else has more
> experience using the Redmond's prodigy child.
>
> cheers, and good luck


Cassandra Counter Family

2013-08-01 Thread Kanwar Sangha
Hi - We are struggling to understand how the counter family maintains 
consistency in Cassandra.

Say Counter1 value is "1" and it is read by 2 clients at the same time who want 
to update the value. After both write, it will become "3" ?


RE: maximum storage per node

2013-07-25 Thread Kanwar Sangha
Issues with large data nodes would be -


* Nodetool repair will be impossible to run

* Your read i/o will suffer since you will almost always go to disk 
(each read will take 3 IOPS worst case)

* Boot-straping the node in case of failures will take days/weeks



From: Pruner, Anne (Anne) [mailto:pru...@avaya.com]
Sent: 25 July 2013 10:45
To: user@cassandra.apache.org
Subject: RE: maximum storage per node

We're storing fairly large files (about 1MB apiece) for a few months and then 
deleting the oldest to get more space to add new ones.  We have large 
requirements (maybe up to 100 TB), so having a 1TB limit would be unworkable.

What is the reason for the limit?  Does something fail after that?

If there are hardware issues, what's recommended?

BTW, we're using Cassandra 1.2

Anne

From: cem [mailto:cayiro...@gmail.com]
Sent: Thursday, July 25, 2013 11:41 AM
To: user@cassandra.apache.org
Subject: Re: maximum storage per node

Between 500GB - 1TB is recommended.

But it depends also your hardware, traffic characteristics and requirements. 
Can you give some details on that?

Best Regards,
Cem

On Thu, Jul 25, 2013 at 5:35 PM, Pruner, Anne (Anne) 
mailto:pru...@avaya.com>> wrote:
Does anyone have opinions on the maximum amount of data reasonable to store on 
one Cassandra node?  If there are limitations, what are the reasons for it?

Thanks,
Anne



CPU Bound Writes

2013-07-19 Thread Kanwar Sangha
"Insert-heavy workloads will actually be CPU-bound in Cassandra before being 
memory-bound"

Can someone explain why the internals of why writes are CPU bound ?




MailBox Impl

2013-07-18 Thread Kanwar Sangha
Hi  - We are planning on using Cassandra for an IMAP based implementation. 
There are some questions that we are stuck with -


1)  Each user will have a pre-defined mailbox size (say 10 MB). We need to 
maintain a field to check if the mail-box size exceeds the predefined size. 
Will using the counter family be appropriate ?

2)  Also, we need to have retention for only 10 days. After 10 days, the 
previous days data will be removed. We plan to have TTL defined per message. 
But if we do that, how does the counter in question 1 get updated with the 
space cleaned due to deletion ?

3)  Do we NOT have TTL and manage the deletions within the application 
itself ?

Thanks,
Kanwar



RE: is there a key to sstable index file?

2013-07-17 Thread Kanwar Sangha
Yes..Multiple SSTables can have same key and only after compaction the keys are 
merged reflect the latest value..

From: S Ahmed [mailto:sahmed1...@gmail.com]
Sent: 17 July 2013 15:54
To: cassandra-u...@incubator.apache.org
Subject: is there a key to sstable index file?

Since SSTables are mutable, and they are ordered, does this mean that there is 
a index of key ranges that each SS table holds, and the value could be 1 more 
sstables that have to be scanned and then the latest one is chosen?

e.g. Say I write a value "abc" to CF1.  This gets stored in a sstable.

Then I write "def" to CF1, this gets stored in another sstable eventually.

How when I go to fetch the value, it has to scan 2 sstables and then figure out 
which is the latest entry correct?

So is there an index of key's to sstables, and there can be 1 or more sstables 
per key?

(This is assuming compaction hasn't occurred yet).


RE: block size

2013-06-20 Thread Kanwar Sangha
Yes. Is that not specific to hadoop with CFS ? I want to know that If I have a 
data in column of size 500KB, how many IOPS are needed to read that ? (assuming 
we have key cache enabled)


From: Shahab Yunus [mailto:shahab.yu...@gmail.com]
Sent: 20 June 2013 14:32
To: user@cassandra.apache.org
Subject: Re: block size

Have you seen this?
http://www.datastax.com/dev/blog/cassandra-file-system-design

Regards,
Shahab

On Thu, Jun 20, 2013 at 3:17 PM, Kanwar Sangha 
mailto:kan...@mavenir.com>> wrote:
Hi - What is the block size for Cassandra ? is it taken from the OS defaults ?



block size

2013-06-20 Thread Kanwar Sangha
Hi - What is the block size for Cassandra ? is it taken from the OS defaults ?


slice query

2013-05-30 Thread Kanwar Sangha
Hi - We gave a dynamic CF which has a key and multiple columns which get added 
dynamically. For example -

Key_1  , Column1, Column2, Column3,...
Key_2 ,  Column1, Column2, Column3,.

Now I want to get all columns after Column3...how do we query that ? The 
ColumnSliceIterator in hector allows to specify a start_column and end_column 
name. But if we don't know the end_column name, will that still work ?

Thanks,
Kanwar



RE: Replica info

2013-05-09 Thread Kanwar Sangha
Thanks ! Is there also a way to find out the replica nodes ?

Say we have 2 DCs, DC1 and DC2 with RF=2 (DC1:1, DC2:1)

Can we find out which node in DC2 is a replica ?



From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: 08 May 2013 21:08
To: user@cassandra.apache.org
Subject: Re: Replica info

http://www.datastax.com/docs/1.1/references/nodetool#nodetool-getendpoints
This tells you where a key lives. (you need to hex encode the key)

On Wed, May 8, 2013 at 5:14 PM, Hiller, Dean 
mailto:dean.hil...@nrel.gov>> wrote:
nodetool describering {keyspace}


From: Kanwar Sangha 
mailto:kan...@mavenir.com><mailto:kan...@mavenir.com<mailto:kan...@mavenir.com>>>
Reply-To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
 
mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Date: Wednesday, May 8, 2013 3:00 PM
To: 
"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>"
 
mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org>>>
Subject: Replica info

Is there a way in Cassandra that we can know which node has the replica for the 
data ? if we have 4 nodes and RF = 2, is there a way we can find which 2 nodes 
have the same data ?

Thanks,
Kanwar



RE: HintedHandoff

2013-05-08 Thread Kanwar Sangha
Is this correct guys ?

From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 07 May 2013 14:07
To: user@cassandra.apache.org
Subject: HintedHandoff

Hi -I had a question on  hinted-handoff.  We have 2 DCs configured with overall 
RF = 2 (DC1:1, DC2:1) and 4 nodes in each DC (total - 8 nodes across 2  DCs)

Now we do a write with CL = ONE and Hinted Handoff enabled.


*If node 'X ' in DC1 which is a 'replica' node is down and a write 
comes with CL =1 to DC1, the co-ordinator node will write the hint and also the 
data will be written to the other 'replica' node in DC2 ? Is this correct ?

*If yes, then when we try to do a 'read' of this data with CL = 
'local_quorum' from DC1, it will fail (since the data was written as a hint) 
and we will need to read it from the other DC ?

Thanks,
Kanwar



Replica info

2013-05-08 Thread Kanwar Sangha
Is there a way in Cassandra that we can know which node has the replica for the 
data ? if we have 4 nodes and RF = 2, is there a way we can find which 2 nodes 
have the same data ?

Thanks,
Kanwar



backup strategy

2013-05-07 Thread Kanwar Sangha
Hi - If we have a RF=2 in a 4 node cluster, how do we ensure that the backup 
taken is only for 1 copy of the data ? in other words, is it possible for us to 
take back-up only from 2 nodes and not all 4 and still have at least 1 copy of 
the data ?

Thanks,
Kanwar





HintedHandoff

2013-05-07 Thread Kanwar Sangha
Hi -I had a question on  hinted-handoff.  We have 2 DCs configured with overall 
RF = 2 (DC1:1, DC2:1) and 4 nodes in each DC (total - 8 nodes across 2  DCs)

Now we do a write with CL = ONE and Hinted Handoff enabled.


*If node 'X ' in DC1 which is a 'replica' node is down and a write 
comes with CL =1 to DC1, the co-ordinator node will write the hint and also the 
data will be written to the other 'replica' node in DC2 ? Is this correct ?

*If yes, then when we try to do a 'read' of this data with CL = 
'local_quorum' from DC1, it will fail (since the data was written as a hint) 
and we will need to read it from the other DC ?

Thanks,
Kanwar



RE: local_quorum

2013-05-05 Thread Kanwar Sangha
Anyone ?

From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 03 May 2013 08:59
To: user@cassandra.apache.org
Subject: local_quorum

Hi - I have 2 data centres (DC1 and DC2) and I have local_quorum set as the CL 
for reads. Say there is a RF factor = 2. (so 2 copies each in DC).

If both nodes which own the data in DC1 are down and I do a read with CL as 
"local_quorum" , will I get an error back to the application ? or will 
Cassandra do automatic routing to the other DC ?

Thanks,
Kanwar



local_quorum

2013-05-03 Thread Kanwar Sangha
Hi - I have 2 data centres (DC1 and DC2) and I have local_quorum set as the CL 
for reads. Say there is a RF factor = 2. (so 2 copies each in DC).

If both nodes which own the data in DC1 are down and I do a read with CL as 
"local_quorum" , will I get an error back to the application ? or will 
Cassandra do automatic routing to the other DC ?

Thanks,
Kanwar



RE: Networking

2013-04-24 Thread Kanwar Sangha
I mean across 2 Data centres. 

-Original Message-
From: Robert Coli [mailto:rc...@eventbrite.com] 
Sent: 24 April 2013 14:56
To: user@cassandra.apache.org
Subject: Re: Networking

On Wed, Apr 24, 2013 at 8:11 AM, Kanwar Sangha  wrote:
> What about a geo-link ? Can that be separated out ?

What does "geo-link" mean here? Cassandra only has two kinds of
communication - client<>servers and servers<>servers.

=Rob


RE: Networking

2013-04-24 Thread Kanwar Sangha
Thrift and intra can be different but what about Geo ?


As the listen address is used for intra-cluster communication, it must be 
changed to a routable address so the other nodes can reach it. For example, 
assuming you have an Ethernet interface with address 192.168.1.1, you would 
change the listen address like so:
listen_address: 192.168.1.1
The Thrift interface can be configured using either a specified address, like 
the listen address, or using the wildcard 0.0.0.0, which causes cassandra to 
listen for clients on all available interfaces. Update it as either:
rpc_address: 192.168.1.1
Or perhaps this machine has a second NIC with ip 10.140.179.1 and so you split 
the traffic for the intra-cluster network traffic from the thrift traffic for 
better performance:
rpc_address: 10.140.179.1



From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 24 April 2013 10:11
To: user@cassandra.apache.org
Subject: Networking

Hi - Is there a way we can separate the replication n/w and the interconnect 
n/w between the Cassandra nodes ? or does all data go over the same n/w 
interface ?

What about a geo-link ? Can that be separated out ?

Thanks,
Kanwar



Networking

2013-04-24 Thread Kanwar Sangha
Hi - Is there a way we can separate the replication n/w and the interconnect 
n/w between the Cassandra nodes ? or does all data go over the same n/w 
interface ?

What about a geo-link ? Can that be separated out ?

Thanks,
Kanwar



Re: index filter

2013-04-19 Thread Kanwar Sangha
Let me rephrase. I am talking about the index file on disk created per sstable. 
Does that contain all key indexes?

Sent from Samsung mobile

Robert Coli  wrote:


On Fri, Apr 19, 2013 at 10:38 AM, Kanwar Sangha  wrote:
> Guys – Quick question. The index filter file created for a sstable contains
> all keys/index offset for a sstable ? I know that when we bring up the node,
> it reads a sample of the keys from this file. So this file contains all keys
> and a sample is read on startup ?

The -Index file and the -Filter file are two different things. The
-Index file is what you describe. The -Filter is a bloom filter.

=Rob


index filter

2013-04-19 Thread Kanwar Sangha
Guys - Quick question. The index filter file created for a sstable contains all 
keys/index offset for a sstable ? I know that when we bring up the node, it 
reads a sample of the keys from this file. So this file contains all keys and a 
sample is read on startup ?

Thanks,
Kanwar





RE: How to make compaction run faster?

2013-04-18 Thread Kanwar Sangha
Use the community edition and try it out. Compaction has nothing to do with the 
CPU. It's all on raw disk speed. What kind of disks do you have ? 7.2k, 10k, 
15k RPM ?

Are your keys unique or you are doing updates ? if unique writes, I would not 
worry about compaction too much and let it run faster on off-peak hours.

From: Jay Svc [mailto:jaytechg...@gmail.com]
Sent: 18 April 2013 14:28
To: user@cassandra.apache.org; Wei Zhu
Subject: Re: How to make compaction run faster?

Hi Wei,

Thank you for your reply.

Yes, I observed that all the concurrent_compactors and multithreaded_compaction 
has no effect on LCS. I also tried with large SSTable size it helped keeping 
the SSTable count low so keeping the pending compaction low. But in spite I 
have more CPU, I am not able to utilize it to make compaction faster. 
Compaction takes few hours to complete.

By the way are you using DSE 3.0+ or community edition? How can we use 
Cassandra 1.2. Its not supported by DSE yet.

Thanks,
Jayant K Kenjale


On Thu, Apr 18, 2013 at 1:25 PM, Wei Zhu 
mailto:wz1...@yahoo.com>> wrote:
We have tried very hard to speed up lcs on 1.1.6 with no luck. It seems to be 
single threaded and not much parallelism you can achieve. 1.2 does come with 
parallel lcs which should help.
One more thing to try is to enlarge the sstable size which will reduce the 
number of SSTable. It *might* help the lcs.


-Wei

From: "Alexis Rodríguez" 
mailto:arodrig...@inconcertcc.com>>
To: user@cassandra.apache.org
Sent: Thursday, April 18, 2013 11:03:13 AM
Subject: Re: How to make compaction run faster?

Jay,

await, according to iostat's man page it is the time of a request to the disk 
to get served. You may try changing the io scheduler. I've read that noop it's 
recommended for SSDs, you can check here http://goo.gl/XMiIA

Regarding compaction, a week ago we had serious problems with compaction in a 
test machine, solved by changing from openjdk 1.6 to sun-jdk 1.6.



On Thu, Apr 18, 2013 at 2:08 PM, Jay Svc 
mailto:jaytechg...@gmail.com>> wrote:
By the way the compaction and commit log disk latency, these are two seperate 
problems I see.

The important one is compaction problem, How I can speed that up?

Thanks,
Jay

On Thu, Apr 18, 2013 at 12:07 PM, Jay Svc 
mailto:jaytechg...@gmail.com>> wrote:
Looks like formatting is bit messed up. Please let me know if you want the same 
in clean format.

Thanks,
Jay

On Thu, Apr 18, 2013 at 12:05 PM, Jay Svc 
mailto:jaytechg...@gmail.com>> wrote:
Hi Aaron, Alexis,

Thanks for reply, Please find some more details below.

Core problems: Compaction is taking longer time to finish. So it will affect my 
reads. I have more CPU and memory, want to utilize that to speed up the 
compaction process.
Parameters used:
1. SSTable size: 500MB (tried various sizes from 20MB to 1GB)
2. Compaction throughput mb per sec: 250MB (tried from 16MB to 640MB)
3. Concurrent write: 196 (tried from 32 to 296)
4. Concurrent compactors: 72 (tried disabling to making it 172)
5. Multithreaded compaction: true (tried both true and false)
6. Compaction strategy: LCS (tried STCS as well)
7. Memtable total space in mb: 4096 MB (tried default and some other params 
too)
Note: I have tried almost all permutation combination of these parameters.
Observations:
I ran test for 1.15 hrs with writes at the rate of 21000 records/sec(total 60GB 
data during 1.15 hrs). After I stopped the test
compaction took additional 1.30 hrs to finish compaction, that reduced the 
SSTable count from 170 to 17.
CPU(24 cores): almost 80% idle during the run
JVM: 48G RAM, 8G Heap, (3G to 5G heap used)
Pending Writes: sometimes high spikes for small amount of time otherwise pretty 
flat
Aaron, Please find the iostat below: the sdb and dm-2 are the commitlog disks.
Please find the iostat of some of 3 different boxes in my cluster.
-bash-4.1$ iostat -xkcd
Linux 2.6.32-358.2.1.el6.x86_64 (edc-epod014-dl380-3) 04/18/2013 _x86_64_ (24 
CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.20 1.11 0.59 0.01 0.00 97.09
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.03 416.56 9.00 7.08 1142.49 1694.55 352.88 0.07 4.08 0.57 0.92
sdb 0.00 172.38 0.08 3.34 10.76 702.89 416.96 0.09 24.84 0.94 
0.32
dm-0 0.00 0.00 0.03 0.75 0.62 3.00 9.24 0.00 1.45 0.33 0.03
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 0.74 0.68 0.00
dm-2 0.00 0.00 0.08 175.72 10.76 702.89 8.12 3.26 18.49 0.02 0.32
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 7.97 0.00 0.83 0.62 0.00
dm-4 0.00 0.00 8.99 422.89 1141.87 1691.55 13.12 4.64 10.71 0.02 0.90
-bash-4.1$ iostat -xkcd
Linux 2.6.32-358.2.1.el6.x86_64 (ndc-epod014-dl380-1) 04/18/2013 _x86_64_ (24 
CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.20 1.12 0.52 0.01 0.00 97.14
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svc
sda 0.01 421.17 9.22 7.43 1167.81 1714.38 346.10 0.07 3.99 0.
sdb 0.00 172.68 0.08 3.

Client lib

2013-04-18 Thread Kanwar Sangha
Hi - We are planning to develop a custom client using the Thrift API for 
Cassandra. Are these available from the JMX ?

- Can cassandra provide info abt node status?
- DC Failover detection (data center down, vs some nodes are down)
- How to get load info from each node?

Thanks,
Kanwar




Timeseries data

2013-03-27 Thread Kanwar Sangha
Hi - I have a query on Read with Cassandra. We are planning to have dynamic 
column family and each column would be on based a timeseries.

Inserting data - key => ‘xxx′, {column_name => TimeUUID(now), :column_value 
=> ‘value’ }, {column_name => TimeUUID(now), :column_value => ‘value’ },..

Now this key might be spread across multiple SSTables over a period of days. 
When we do a READ query to fetch say a slice of data from this row based on 
time X->Y , would it need to get data from ALL sstables ?

Thanks,
Kanwar



Hinted Handoff

2013-03-25 Thread Kanwar Sangha
Hi - Quick question. Do hints contain the actual data or the data is read from 
the SStables and then sent to the other node when it comes up ?

Thanks,
Kanwar



cfhistograms

2013-03-25 Thread Kanwar Sangha
Can someone explain how to read the cfhistograms o/p ?

[root@db4 ~]# nodetool cfhistograms usertable data
usertable/data histograms
Offset  SSTables Write Latency  Read Latency  Row Size  
Column Count
12857444  4051 0 0  
   342711
26355104 27021 0 0  
201313
32579941 61600 0 0  
130489
4 374067119286 0 0  
   91378
5   9175210934 0 0  
   68548
6  0321098 0 0  
   54479
7  0476677 0 0  
   45427
8  0734846 0 0  
   38814
10 0   2867967 4 0  
   65512
12 0   536684422 0  
   59967
14 0   691143136 0  
   63980
17 0  10155740   127 0  
  115714
20 0   7432318   302 0  
  138759
24 0   5231047   969 0  
  193477
29 0   2368553  2790 0  
  209998
35 0859591  4385 0  
  204751
42 0456978  3790 0  
  214658
50 0306084  2465 0  
  151838
60 0223202  2158 0  
   40277
72 0122906  2896 0  
1735


Thanks
Kanwar



RE: High disk I/O during reads

2013-03-22 Thread Kanwar Sangha
Just as a test - can you disable/reduce compaction throughput and see if that 
makes a difference ? Compaction eats a lot of I/O.


From: zod...@fifth-aeon.net [mailto:zod...@fifth-aeon.net] On Behalf Of Jon 
Scarborough
Sent: 22 March 2013 15:01
To: user@cassandra.apache.org; Wei Zhu
Subject: Re: High disk I/O during reads

Checked tpstats, there are very few dropped messages.

Checked histograms. Mostly nothing surprising. The vast majority of rows are 
small, and most reads only access one or two SSTables.

What I did discover is that of our 5 nodes, one is performing well, with disk 
I/O in the ballprk that seems reasonable. The other 4 nodes are doing roughly 
4x the disk i/O per second.  Interestingly, the node that is performing well 
also seems to be servicing about twice the number of reads that the other nodes 
are.

I compared configuration between the node performing well to those that aren't, 
and so far haven't found any discrepancies.
On Fri, Mar 22, 2013 at 10:43 AM, Wei Zhu 
mailto:wz1...@yahoo.com>> wrote:
According to your cfstats, read latency is over 100 ms which is really really 
slow. I am seeing less than 3ms reads for my cluster which is on SSD. Can you 
also check the nodetool cfhistorgram, it tells you more about the number of 
SSTable involved and read/write latency. Somtimes average doesn't tell you the 
whole storey.
Also check your nodetool tpstats, are there a lot dropped reads?

-Wei
- Original Message -
From: "Jon Scarborough" mailto:j...@fifth-aeon.net>>
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Sent: Friday, March 22, 2013 9:42:34 AM
Subject: Re: High disk I/O during reads

Key distribution across probably varies a lot from row to row in our case. Most 
reads would probably only need to look at a few SSTables, a few might need to 
look at more.

I don't yet have a deep understanding of C* internals, but I would imagine even 
the more expensive use cases would involve something like this:

1) Check the index for each SSTable to determine if part of the row is there.
2) Look at the endpoints of the slice to determine if the data in a particular 
SSTable is relevant to the query.
3) Read the chunks of those SSTables, working backwards from the end of the 
slice until enough columns have been read to satisfy the limit clause in the 
query.

So I would have guessed that even the more expensive queries on wide rows 
typically wouldn't need to read more than a few hundred KB from disk to do all 
that. Seems like I'm missing something major.

Here's the complete CF definition, including compression settings:

CREATE COLUMNFAMILY conversation_text_message (
conversation_key bigint PRIMARY KEY
) WITH
comment='' AND
comparator='CompositeType(org.apache.cassandra.db.marshal.DateType,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.AsciiType,org.apache.cassandra.db.marshal.AsciiType)'
 AND
read_repair_chance=0.10 AND
gc_grace_seconds=864000 AND
default_validation=text AND
min_compaction_threshold=4 AND
max_compaction_threshold=32 AND
replicate_on_write=True AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='org.apache.cassandra.io.compress.SnappyCompressor';

Much thanks for any additional ideas.

-Jon



On Fri, Mar 22, 2013 at 8:15 AM, Hiller, Dean < 
dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov> > wrote:


Did you mean to ask "are 'all' your keys spread across all SSTables"? I am 
guessing at your intention.

I mean I would very well hope my keys are spread across all sstables or 
otherwise that sstable should not be there as he has no keys in it ;).

And I know we had HUGE disk size from the duplication in our sstables on 
size-tiered compactionwe never ran a major compaction but after we switched 
to LCS, we went from 300G to some 120G or something like that which was nice. 
We only have 300 data point posts / second so not an extreme write load on 6 
nodes as well though these posts causes read to check authorization and such of 
our system.

Dean

From: Kanwar Sangha < kan...@mavenir.com<mailto:kan...@mavenir.com> mailto:kan...@mavenir.com> >>
Reply-To: " user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
mailto:user@cassandra.apache.org> >" < 
user@cassandra.apache.org<mailto:user@cassandra.apache.org> mailto:user@cassandra.apache.org> >>
Date: Friday, March 22, 2013 8:38 AM
To: " user@cassandra.apache.org<mailto:user@cassandra.apache.org> mailto:user@cassandra.apache.org> >" < 
user@cassandra.apache.org<mailto:user@cassandra.apache.org> mailto:user@cassandra.apache.org> >>
Subject: RE: High disk I/O during reads


Are your Keys spread across all SSTables ? That will cause every sstable read 
which will increase

RE: High disk I/O during reads

2013-03-22 Thread Kanwar Sangha
Sorry...I meant to ask "is your key" spread across multiple sstables ? But with 
LCS, your reads should ideally be served from one sstable most of the times..




-Original Message-
From: Hiller, Dean [mailto:dean.hil...@nrel.gov] 
Sent: 22 March 2013 10:16
To: user@cassandra.apache.org
Subject: Re: High disk I/O during reads

Did you mean to ask "are 'all' your keys spread across all SSTables"?  I am 
guessing at your intention.

I mean I would very well hope my keys are spread across all sstables or 
otherwise that sstable should not be there as he has no keys in it ;).

And I know we had HUGE disk size from the duplication in our sstables on 
size-tiered compactionwe never ran a major compaction but after we switched 
to LCS, we went from 300G to some 120G or something like that which was nice.  
We only have 300 data point posts / second so not an extreme write load on 6 
nodes as well though these posts causes read to check authorization and such of 
our system.

Dean

From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Friday, March 22, 2013 8:38 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: RE: High disk I/O during reads

Are your Keys spread across all SSTables ? That will cause every sstable read 
which will increase the I/O.

What compaction are you using ?

From: zod...@fifth-aeon.net<mailto:zod...@fifth-aeon.net> 
[mailto:zod...@fifth-aeon.net] On Behalf Of Jon Scarborough
Sent: 21 March 2013 23:00
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: High disk I/O during reads

Hello,

We've had a 5-node C* cluster (version 1.1.0) running for several months.  Up 
until now we've mostly been writing data, but now we're starting to service 
more read traffic.  We're seeing far more disk I/O to service these reads than 
I would have anticipated.

The CF being queried consists of chat messages.  Each row represents a 
conversation between two people.  Each column represents a message.  The column 
key is composite, consisting of the message date and a few other bits of 
information.  The CF is using compression.

The query is looking for a maximum of 50 messages between two dates, in reverse 
order.  Usually the two dates used as endpoints are 30 days ago and the current 
time.  The query in Astyanax looks like this:

ColumnList result = 
keyspace.prepareQuery(CF_CONVERSATION_TEXT_MESSAGE)
.setConsistencyLevel(ConsistencyLevel.CL_QUORUM)
.getKey(conversationKey)
.withColumnRange(
textMessageSerializer.makeEndpoint(endDate, 
Equality.LESS_THAN).toBytes(),
textMessageSerializer.makeEndpoint(startDate, 
Equality.GREATER_THAN_EQUALS).toBytes(),
true,
maxMessages)
.execute()
.getResult();

We're currently servicing around 30 of these queries per second.

Here's what the cfstats for the CF look like:

Column Family: conversation_text_message
SSTable count: 15
Space used (live): 211762982685
Space used (total): 211762982685
Number of Keys (estimate): 330118528
Memtable Columns Count: 68063
Memtable Data Size: 53093938
Memtable Switch Count: 9743
Read Count: 4313344
Read Latency: 118.831 ms.
Write Count: 817876950
Write Latency: 0.023 ms.
Pending Tasks: 0
Bloom Filter False Postives: 6055
Bloom Filter False Ratio: 0.00260
Bloom Filter Space Used: 686266048
Compacted row minimum size: 87
Compacted row maximum size: 14530764
Compacted row mean size: 1186

On the C* nodes, iostat output like this is typical, and can spike to be much 
worse:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.910.002.08   30.660.50   64.84

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
xvdap10.13 0.00 1.07  0 16
xvdb474.20 13524.5325.33 202868380
xvdc469.87 13455.7330.40 201836456
md0 972.13 26980.2755.73 404704836

Any thoughts on what could be causing read I/O to the disk from these queries?

Much thanks!

-Jon


RE: High disk I/O during reads

2013-03-22 Thread Kanwar Sangha
Are your Keys spread across all SSTables ? That will cause every sstable read 
which will increase the I/O.

What compaction are you using ?

From: zod...@fifth-aeon.net [mailto:zod...@fifth-aeon.net] On Behalf Of Jon 
Scarborough
Sent: 21 March 2013 23:00
To: user@cassandra.apache.org
Subject: High disk I/O during reads

Hello,

We've had a 5-node C* cluster (version 1.1.0) running for several months.  Up 
until now we've mostly been writing data, but now we're starting to service 
more read traffic.  We're seeing far more disk I/O to service these reads than 
I would have anticipated.

The CF being queried consists of chat messages.  Each row represents a 
conversation between two people.  Each column represents a message.  The column 
key is composite, consisting of the message date and a few other bits of 
information.  The CF is using compression.

The query is looking for a maximum of 50 messages between two dates, in reverse 
order.  Usually the two dates used as endpoints are 30 days ago and the current 
time.  The query in Astyanax looks like this:

ColumnList result = 
keyspace.prepareQuery(CF_CONVERSATION_TEXT_MESSAGE)
.setConsistencyLevel(ConsistencyLevel.CL_QUORUM)
.getKey(conversationKey)
.withColumnRange(
textMessageSerializer.makeEndpoint(endDate, 
Equality.LESS_THAN).toBytes(),
textMessageSerializer.makeEndpoint(startDate, 
Equality.GREATER_THAN_EQUALS).toBytes(),
true,
maxMessages)
.execute()
.getResult();

We're currently servicing around 30 of these queries per second.

Here's what the cfstats for the CF look like:

Column Family: conversation_text_message
SSTable count: 15
Space used (live): 211762982685
Space used (total): 211762982685
Number of Keys (estimate): 330118528
Memtable Columns Count: 68063
Memtable Data Size: 53093938
Memtable Switch Count: 9743
Read Count: 4313344
Read Latency: 118.831 ms.
Write Count: 817876950
Write Latency: 0.023 ms.
Pending Tasks: 0
Bloom Filter False Postives: 6055
Bloom Filter False Ratio: 0.00260
Bloom Filter Space Used: 686266048
Compacted row minimum size: 87
Compacted row maximum size: 14530764
Compacted row mean size: 1186

On the C* nodes, iostat output like this is typical, and can spike to be much 
worse:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   1.910.002.08   30.660.50   64.84

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
xvdap10.13 0.00 1.07  0 16
xvdb474.20 13524.5325.33 202868380
xvdc469.87 13455.7330.40 201836456
md0 972.13 26980.2755.73 404704836

Any thoughts on what could be causing read I/O to the disk from these queries?

Much thanks!

-Jon


chunk lenght

2013-03-09 Thread Kanwar Sangha
Hi - Can someone help explain this parameter ?

chunk_length_kb

If we increase it from default 64k to 128k does it mean that the sstable will 
be compressed in blocks of 128k ? Does that mean if we are reading and writing 
data of 128k , it will give a better read/write
performance ?

Thanks,
Kanwar



RE: leveled compaction

2013-03-08 Thread Kanwar Sangha
Cool ! So of we exceed the threshold, is that an issue… ?

From: Yuki Morishita [mailto:mor.y...@gmail.com]
Sent: 08 March 2013 15:57
To: user@cassandra.apache.org
Subject: Re: leveled compaction

It is SSTable counts in each level.

SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0]
So you have 40 SSTables in L0, 442 in L1, 97 in L2 and so forth.
'40/4' and '442/10' have numbers after slash, those are expected maximum number 
of
SSTables in that level and only displayed when you have more than that 
threshold.

On Friday, March 8, 2013 at 3:24 PM, Kanwar Sangha wrote:

Hi –



Can someone explain the meaning for the levelled compaction in cfstats –



SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0]



SSTables in each level: [61/4, 9, 92, 945, 8146, 0, 0, 0]



SSTables in each level: [34/4, 1000/10, 100, 953, 8184, 0, 0, 0





Thanks,

Kanwar





leveled compaction

2013-03-08 Thread Kanwar Sangha
Hi -

Can someone explain the meaning for the levelled compaction in cfstats -

SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0]

SSTables in each level: [61/4, 9, 92, 945, 8146, 0, 0, 0]

SSTables in each level: [34/4, 1000/10, 100, 953, 8184, 0, 0, 0


Thanks,
Kanwar



VNodes and nodetool repair

2013-03-07 Thread Kanwar Sangha
Hi Guys - I have a question on Vnodes and nodetool repair.  If I have 
configured the nodes as vnodes, say for example 2 nodes with Rf=2.

Questions -


*There are some columns set with TTL as X. After X Cassandra will mark 
them as tombstones. Is there still a probability of running into 
DistributedDeletes issue ?  I understand that "distributeddeletes" is more 
applicable to application deletes ?

*Nodetool repair will ask the neighbour node say node 2 to generate the 
merkle tree. As I understand, currently the repair introduces  2 compactions. 
Repairs currently require 2 major compactions: one to validate a column family, 
and then another to send the disagreeing ranges. Will this be done over the 
complete data set in the node 2 ? Or only for the range as per Vnodes ?

*How does Cassandra do nodetool repair across Data centres ? Assume 
RF=1 in DC1 and RF=1 in DC2 with total RF = 2 across the two DCs.

Thanks,
Kanwar



RE: Hinted handoff

2013-03-07 Thread Kanwar Sangha

In the normal case it's best to have around 300 to 500GB per node. With that 
much data is will take a week to run repair or replace a failed node.

Hi Aaron - This was true for pre 1.2 but with 1.2 and virtual nodes, does this 
still hold ? 1 TB at 1Gb/s will take roughly 2.2hrs assume we stream from say 
100 nodes...


From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: 06 March 2013 23:47
To: user@cassandra.apache.org
Subject: Re: Hinted handoff

Check the IO utilisation using iostat

You *really* should not need to make HH run faster, if you do there is some 
thing bad going on. I would consider dropping the hints and running repair.

Data is ~9.5 TB
Do you have 9.5TB on a single node ?
In the normal case it's best to have around 300 to 500GB per node. With that 
much data is will take a week to run repair or replace a failed node.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/03/2013, at 1:22 PM, Kanwar Sangha 
mailto:kan...@mavenir.com>> wrote:


Is this correct ?

I have Raid 0 setup for 16 TB across 8 disks. Each disk is 7.2kRPM with IOPS of 
80 per disk. Data is ~9.5 TB

So 4K * 80 * 9.5 = 3040 KB ~  23.75 Mb/s.

So basically I am limited at the disk rather than the n/w

From: Kanwar Sangha [mailto:kan...@mavenir.com<http://mavenir.com>]
Sent: 06 March 2013 15:11
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Hinted handoff

After trying to bump up the "hinted_handoff_throttle_in_kb" to 1G/b per sec, It 
still does not go above 25Mb/s.  Is there a limitation ?



From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 14:41
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Hinted handoff

Got the param. thanks

From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 13:50
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Hinted handoff

Hi - Is there a way to increase the hinted handoff throughput ? I am seeing 
around 8Mb/s (bits).

Thanks,
Kanwar



RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
Is this correct ?

I have Raid 0 setup for 16 TB across 8 disks. Each disk is 7.2kRPM with IOPS of 
80 per disk. Data is ~9.5 TB

So 4K * 80 * 9.5 = 3040 KB ~  23.75 Mb/s.

So basically I am limited at the disk rather than the n/w

From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 15:11
To: user@cassandra.apache.org
Subject: RE: Hinted handoff

After trying to bump up the "hinted_handoff_throttle_in_kb" to 1G/b per sec, It 
still does not go above 25Mb/s.  Is there a limitation ?



From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 14:41
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Hinted handoff

Got the param. thanks

From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 13:50
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Hinted handoff

Hi - Is there a way to increase the hinted handoff throughput ? I am seeing 
around 8Mb/s (bits).

Thanks,
Kanwar



RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
After trying to bump up the "hinted_handoff_throttle_in_kb" to 1G/b per sec, It 
still does not go above 25Mb/s.  Is there a limitation ?



From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 14:41
To: user@cassandra.apache.org
Subject: RE: Hinted handoff

Got the param. thanks

From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 13:50
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Hinted handoff

Hi - Is there a way to increase the hinted handoff throughput ? I am seeing 
around 8Mb/s (bits).

Thanks,
Kanwar



RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
Got the param. thanks

From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 13:50
To: user@cassandra.apache.org
Subject: Hinted handoff

Hi - Is there a way to increase the hinted handoff throughput ? I am seeing 
around 8Mb/s (bits).

Thanks,
Kanwar



Hinted handoff

2013-03-06 Thread Kanwar Sangha
Hi - Is there a way to increase the hinted handoff throughput ? I am seeing 
around 8Mb/s (bits).

Thanks,
Kanwar



RE: Replication Question

2013-03-04 Thread Kanwar Sangha
Keep in mind that even at consistency level ONE or LOCAL_QUORUM, the write is 
still sent to all replicas for the written key, even replicas in other data 
centers. The consistency level just determines how many replicas are required 
to respond that they received the write.

Is this true for Reads also ?



From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 04 March 2013 14:54
To: user@cassandra.apache.org
Subject: Replication Question

Hi - If I configure a RF across 2 Data centres as below  and assuming 3 nodes 
per Data centre.

DC1: 2, DC2:2

I do a write with consistency level - local_quorum which ensures that there is 
no inter DC latency. Now say 2 nodes in DC1 crash and I am doing a read with CL 
= One. Will it return failure to client since the data is now only present in 
DC2 ? So  I would need to do a read with CL = ALL/EACH_QUORUM to ensure that I 
always get the data even in case of crashes in the local DC ?

Thanks,
Kanwar




Replication Question

2013-03-04 Thread Kanwar Sangha
Hi - If I configure a RF across 2 Data centres as below  and assuming 3 nodes 
per Data centre.

DC1: 2, DC2:2

I do a write with consistency level - local_quorum which ensures that there is 
no inter DC latency. Now say 2 nodes in DC1 crash and I am doing a read with CL 
= One. Will it return failure to client since the data is now only present in 
DC2 ? So  I would need to do a read with CL = ALL/EACH_QUORUM to ensure that I 
always get the data even in case of crashes in the local DC ?

Thanks,
Kanwar




RE: Storage question

2013-03-04 Thread Kanwar Sangha
Problems with small files and HDFS

A small file is one which is significantly smaller than the HDFS block size 
(default 64MB). If you're storing small files, then you probably have lots of 
them (otherwise you wouldn't turn to Hadoop), and the problem is that HDFS 
can't handle lots of files.

Every file, directory and block in HDFS is represented as an object in the 
namenode's memory, each of which occupies 150 bytes, as a rule of thumb. So 10 
million files, each using a block, would use about 3 gigabytes of memory. 
Scaling up much beyond this level is a problem with current hardware. Certainly 
a billion files is not feasible.

Furthermore, HDFS is not geared up to efficiently accessing small files: it is 
primarily designed for streaming access of large files. Reading through small 
files normally causes lots of seeks and lots of hopping from datanode to 
datanode to retrieve each small file, all of which is an inefficient data 
access pattern.
Problems with small files and MapReduce

Map tasks usually process a block of input at a time (using the default 
FileInputFormat). If the file is very small and there are a lot of them, then 
each map task processes very little input, and there are a lot more map tasks, 
each of which imposes extra bookkeeping overhead. Compare a 1GB file broken 
into 16 64MB blocks, and 10,000 or so 100KB files. The 10,000 files use one map 
each, and the job time can be tens or hundreds of times slower than the 
equivalent one with a single input file.

There are a couple of features to help alleviate the bookkeeping overhead: task 
JVM reuse for running multiple map tasks in one JVM, thereby avoiding some JVM 
startup overhead (see the mapred.job.reuse.jvm.num.tasks property), and 
MultiFileInputSplit which can run more than one split per map.

-Original Message-
From: Hiller, Dean [mailto:dean.hil...@nrel.gov] 
Sent: 04 March 2013 13:38
To: user@cassandra.apache.org
Subject: Re: Storage question

Well, astyanax I know can simulate streaming into cassandra and disperses the 
file to multiple rows in the cluster so you could check that out.

Out of curiosity, why is HDFS not good for a small file size?  For reading, it 
should be the bomb with RF=3 since you can read from multiple nodes and such.  
Writes might be a little slower but still shouldn't be too bad.

Later,
Dean

From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Monday, March 4, 2013 12:34 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Storage question

Hi - Can someone suggest the optimal way to store files / images ? We are 
planning to use cassandra for meta-data for these files.  HDFS is not good for 
small file size .. can we look at something else ?


Storage question

2013-03-04 Thread Kanwar Sangha
Hi - Can someone suggest the optimal way to store files / images ? We are 
planning to use cassandra for meta-data for these files.  HDFS is not good for 
small file size .. can we
look at something else ?

Thanks,
Kanwar



NetworkTopology

2013-02-28 Thread Kanwar Sangha
Hi - Quick question. When specifying the replication across 2 DCs, can we have 
1 replication factor across 2 Data centres ? Does the below mean that there 
will be 2 copies of the data , 1 in DC1 and 1 in DC2 ?

[default@unknown] CREATE KEYSPACE test
  WITH placement_strategy = 'NetworkTopologyStrategy'
  AND strategy_options={DC1:1,DC2:1};

So if I specify , DC1:2,DC2:2 - This means we have total 4 copies ? 2 in DC1 
and 2 in DC2 ?



Thanks,
Kanwar






RE: Read Perf

2013-02-26 Thread Kanwar Sangha
Yep. So the read will remain constant in this case ?


-Original Message-
From: Hiller, Dean [mailto:dean.hil...@nrel.gov] 
Sent: 26 February 2013 09:32
To: user@cassandra.apache.org
Subject: Re: Read Perf

In that case, make sure you don't plan on going into the millions or test the 
limit as I pretty sure it can't go above 10 million. (from previous posts on 
this list).

Dean

On 2/26/13 8:23 AM, "Kanwar Sangha"  wrote:

>Thanks. For our case, the no of rows will more or less be the same. The 
>only thing which changes is the columns and they keep getting added.
>
>-Original Message-
>From: Hiller, Dean [mailto:dean.hil...@nrel.gov]
>Sent: 26 February 2013 09:21
>To: user@cassandra.apache.org
>Subject: Re: Read Perf
>
>To find stuff on disk, there is a bloomfilter for each file in memory.
>On the docs, 1 billion rows has 2Gig of RAM, so it really will have a 
>huge dependency on your number of rows.  As you get more rows, you may 
>need to modify the bloomfilter false positive to use less RAM but that 
>means slower reads.  Ie. As you add more rows, you will have slower 
>reads on a single machine.
>
>We hit the RAM limit on one machine with 1 billion rows so we are in 
>the process of tweaking the ratio of 0.000744(the default) to 0.1 to 
>give us more time to solve.  Since we see no I/o load on our 
>machines(or rather extremely little), we plan on moving to leveled 
>compaction where 0.1 is the default in new releases and size tiered new 
>default I think is 0.01.
>
>Ie. If you store more data per row, this is not an issue as much but 
>still something to consider.  (Also, rows have a limit I think as well 
>on data size but not sure what that is.  I know the column limit on a 
>row is in the millions, somewhere lower than 10 million).
>
>Later,
>Dean
>
>From: Kanwar Sangha mailto:kan...@mavenir.com>>
>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>mailto:user@cassandra.apache.org>>
>Date: Monday, February 25, 2013 8:31 PM
>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
>mailto:user@cassandra.apache.org>>
>Subject: Read Perf
>
>Hi - I am doing a performance run using modified YCSB client and was 
>able to populate 8TB on a node and then ran some read workloads. I am 
>seeing an average TPS of 930 ops/sec for random reads. There is no key 
>cache/row cache. Question -
>
>Will the read TPS degrade if the data size increases to say 20 TB , 50 
>TB, 100 TB ? If I understand correctly, the read should remain constant 
>irrespective of the data size since we eventually have sorted SStables 
>and binary search would be done on the index filter to find the row ?
>
>
>Thanks,
>Kanwar



RE: Read Perf

2013-02-26 Thread Kanwar Sangha
Thanks. For our case, the no of rows will more or less be the same. The only 
thing which changes is the columns and they keep getting added. 

-Original Message-
From: Hiller, Dean [mailto:dean.hil...@nrel.gov] 
Sent: 26 February 2013 09:21
To: user@cassandra.apache.org
Subject: Re: Read Perf

To find stuff on disk, there is a bloomfilter for each file in memory.  On the 
docs, 1 billion rows has 2Gig of RAM, so it really will have a huge dependency 
on your number of rows.  As you get more rows, you may need to modify the 
bloomfilter false positive to use less RAM but that means slower reads.  Ie. As 
you add more rows, you will have slower reads on a single machine.

We hit the RAM limit on one machine with 1 billion rows so we are in the 
process of tweaking the ratio of 0.000744(the default) to 0.1 to give us more 
time to solve.  Since we see no I/o load on our machines(or rather extremely 
little), we plan on moving to leveled compaction where 0.1 is the default in 
new releases and size tiered new default I think is 0.01.

Ie. If you store more data per row, this is not an issue as much but still 
something to consider.  (Also, rows have a limit I think as well on data size 
but not sure what that is.  I know the column limit on a row is in the 
millions, somewhere lower than 10 million).

Later,
Dean

From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Monday, February 25, 2013 8:31 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Read Perf

Hi - I am doing a performance run using modified YCSB client and was able to 
populate 8TB on a node and then ran some read workloads. I am seeing an average 
TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question -

Will the read TPS degrade if the data size increases to say 20 TB , 50 TB, 100 
TB ? If I understand correctly, the read should remain constant irrespective of 
the data size since we eventually have sorted SStables and binary search would 
be done on the index filter to find the row ?


Thanks,
Kanwar


Read Perf

2013-02-25 Thread Kanwar Sangha
Hi - I am doing a performance run using modified YCSB client and was able to 
populate 8TB on a node and then ran some read workloads. I am seeing an average 
TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question -

Will the read TPS degrade if the data size increases to say 20 TB , 50 TB, 100 
TB ? If I understand correctly, the read should remain constant irrespective of 
the data size since we eventually have sorted SStables and binary search would 
be done on the index filter to find the row ?


Thanks,
Kanwar


RE: Cassandra with SAN

2013-02-21 Thread Kanwar Sangha
Ok. What would be the drawbacks :)

From: Michael Kjellman [mailto:mkjell...@barracuda.com]
Sent: 21 February 2013 17:12
To: user@cassandra.apache.org
Subject: Re: Cassandra with SAN

No, this is a really really bad idea and C* was not designed for this, in fact, 
it was designed so you don't need to have a large expensive SAN.

Don't be tempted by the shiny expensive SAN. :)

If money is no object instead throw SSD's in your nodes and run 10G between 
racks

From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, February 21, 2013 2:56 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Cassandra with SAN

Hi - Is it a good idea to use Cassandra with SAN ?  Say a SAN which provides me 
8 Petabytes of storage. Would I not be I/O bound irrespective of the no of 
Cassandra machines and scaling by adding
machines won't help ?

Thanks
Kanwar

--
Copy, by Barracuda, helps you store, protect, and share all your amazing 
things. Start today: www.copy.com<http://www.copy.com?a=em_footer>.
  


Cassandra with SAN

2013-02-21 Thread Kanwar Sangha
Hi - Is it a good idea to use Cassandra with SAN ?  Say a SAN which provides me 
8 Petabytes of storage. Would I not be I/O bound irrespective of the no of 
Cassandra machines and scaling by adding
machines won't help ?

Thanks
Kanwar


RE: cassandra vs. mongodb quick question(good additional info)

2013-02-21 Thread Kanwar Sangha
“The limiting factors are the time it take to repair, the time it takes to 
replace a node, the memory considerations for 100's of millions of rows. If you 
the performance of those operations is acceptable to you, then go crazy”



If I have a node which is attached to a RAID and the node crashes but the data 
is still good on the drives, it would just mean bringing up the node using the 
same storage ? would this not be fast…?




From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: 21 February 2013 11:46
To: user@cassandra.apache.org
Subject: Re: cassandra vs. mongodb quick question(good additional info)

If you are lazy like me wolfram alpha can help

http://www.wolframalpha.com/input/?i=transfer+42TB+at+10GbE&a=UnitClash_*TB.*Tebibytes--

10 hours 15 minutes 43.59 seconds

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/02/2013, at 11:31 AM, Wojciech Meler 
mailto:wojciech.me...@gmail.com>> wrote:



you have 86400 seconds a day so 42T could take less than 12 hours on 10Gb link
19 lut 2013 02:01, "Hiller, Dean" 
mailto:dean.hil...@nrel.gov>> napisał(a):
I thought about this more, and even with a 10Gbit network, it would take 40 
days to bring up a replacement node if mongodb did truly have a 42T / node like 
I had heard.  I wrote the below email to the person I heard this from going 
back to basics which really puts some perspective on it….(and a lot of people 
don't even have a 10Gbit network like we do)

Nodes are hooked up by a 10G network at most right now where that is 10gigabit. 
 We are talking about 10Terabytes on disk per node recently.

Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second  (yes I could 
have divided by 8 in my head but eh…course when I saw the number, I went duh)

So trying to transfer 10 Terabytes  or 10,000 Gigabytes to a node that we are 
bringing online to replace a dead node would take approximately 5 days???

This means no one else is using the bandwidth too ;).  10,000Gigabytes * 1 
second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.55 days.  This is more likely 
11 days if we only use 50% of the network.

So bringing a new node up to speed is more like 11 days once it is crashed.  I 
think this is the main reason the 1Terabyte exists to begin with, right?

From an ops perspective, this could sound like a nightmare scenario of waiting 
10 days…..maybe it is livable though.  Either way, I thought it would be good 
to share the numbers.  ALSO, that is assuming the bus with it's 10 disk can 
keep up with 10G  Can it?  What is the limit of throughput on a bus / 
second on the computers we have as on wikipedia there is a huge variance?

What is the rate of the disks too (multiplied by 10 of course)?  Will they keep 
up with a 10G rate for bringing a new node online?

This all comes into play even more so when you want to double the size of your 
cluster of course as all nodes have to transfer half of what they have to all 
the new nodes that come online(cassandra actually has a very data center/rack 
aware topology to transfer data correctly to not use up all bandwidth 
unecessarily…I am not sure mongodb has that).  Anyways, just food for thought.

From: aaron morton 
mailto:aa...@thelastpickle.com>>>
Reply-To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Date: Monday, February 18, 2013 1:39 PM
To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>,
 Vegard Berget 
mailto:p...@fantasista.no>>>
Subject: Re: cassandra vs. mongodb quick question

My experience is repair of 300GB compressed data takes longer than 300GB of 
uncompressed, but I cannot point to an exact number. Calculating the 
differences is mostly CPU bound and works on the non compressed data.

Streaming uses compression (after uncompressing the on disk data).

So if you have 300GB of compressed data, take a look at how long repair takes 
and see if you are comfortable with that. You may also want to test replacing a 
node so you can get the procedure documented and understand how long it takes.

The idea of the soft 300GB to 500GB limit cam about because of a number of 
cases where people had 1 TB on a single node and they were surprised it took 
days to repair or replace. If you know how long things may take, and that fits 
in your operations then go with it.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/02/201

RE: SSTable Num

2013-02-21 Thread Kanwar Sangha
No.
The default size tiered strategy compacts files what are roughly the same size, 
and only when there are more than 4 (default) of them.

Ok. So for 10 TB, I could have at least 4 SStables files each of 2.5 TB ?

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: 21 February 2013 11:01
To: user@cassandra.apache.org
Subject: Re: SSTable Num

Hi - I have around 6TB of data on 1 node
Unless you have SSD and 10GbE you probably have too much data on there.
Remember you need to run repair and that can take a long time with a lot of 
data. Also you may need to replace a node one day and moving 6TB will take a 
while.

 Or will the sstable compaction continue and eventually we will have 1 file ?
No.
The default size tiered strategy compacts files what are roughly the same size, 
and only when there are more than 4 (default) of them.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/02/2013, at 3:47 AM, Kanwar Sangha 
mailto:kan...@mavenir.com>> wrote:


Hi - I have around 6TB of data on 1 node and the cfstats show 32 sstables. 
There is no compaction job running in the background. Is there a limit on the 
size per sstable ? Or will the sstable compaction continue and eventually we 
will have 1 file ?

Thanks,
Kanwar




RE: Read IO

2013-02-21 Thread Kanwar Sangha
Ok.. Cassandra default block size is 256k ? Now say my data in the column is 4 
MB.  And the disk is giving me 4k block size random reads @ 100 IOPS. I can 
read max 400k in one seek ? does that mean I would need multiple seeks to get 
the complete data ?


-Original Message-
From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter Schuller
Sent: 21 February 2013 00:05
To: user@cassandra.apache.org
Subject: Re: Read IO

> Is this correct ?

Yes, at least under optimal conditions and assuming a reasonably sized row. 
Things like read-ahead (at the kernel level) will play into it; and if your 
read (even if assumed to be small) straddles two pages you might or might not 
take another read depending on your kernel settings (typically trading 
pollution of page cache vs. number of I/O:s).

--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


key cache size

2013-02-21 Thread Kanwar Sangha
Hi - What is the approximate overhead of the key cache ? Say each key is 50 
bytes. What would be the overhead for this key in the key cache ?

Thanks,
Kanwar


Read IO

2013-02-20 Thread Kanwar Sangha
Hi - Can someone explain the worst case IOPS for a read ? No key cache, No row 
cache, sampling rate say 512.


1)  Bloom filter will be checked to see existence of key (In RAM)

2)  Index filer sample (IN RAM) will be checked to find approx. location in 
index file on disk

3)  1 IOPS to read the actual index file on disk (DISK)

4)  1 IOPS to get the data from the location in the sstable (DISK)

Is this correct ?




File Store

2013-02-20 Thread Kanwar Sangha
Hi - I am looking for some inputs on the file storage in Cassandra.  Each file 
size can range from 200kb - 3MB.  I don't see any limitation on the column 
size. But would it be a good idea to store these files as binary in the columns 
?

Thanks,
Kanwar



SSTable Num

2013-02-20 Thread Kanwar Sangha
Hi - I have around 6TB of data on 1 node and the cfstats show 32 sstables. 
There is no compaction job running in the background. Is there a limit on the 
size per sstable ? Or will the sstable compaction continue and eventually we 
will have 1 file ?

Thanks,
Kanwar



RE: Cassandra backup

2013-02-18 Thread Kanwar Sangha
Thanks. I will look into the details.

One issue I see is that if I have only one column family which needs only the 
last 7 days data to be on SSD and the rest to be on the HDD, how will that work.

From: Michael Kjellman [mailto:mkjell...@barracuda.com]
Sent: 18 February 2013 20:08
To: user@cassandra.apache.org
Subject: Re: Cassandra backup

There is this:

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement

But you'll need to design your data model around the fact that this is only as 
granular as 1 column family

Best,
michael

From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Monday, February 18, 2013 6:06 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Cassandra backup

Hi - We have a req to store around 90 days of data per user. Last 7 days of 
data is going to be accessed frequently. Is there a way we can have the recent 
data (7 days) in SSD and the rest of the data in the
HDD ? Do we take a snapshot every 7 days and use a separate 'archive' cluster 
to serve the old data and a 'active' cluster to serve recent data ?

Any links/thoughts would be helpful.

Thanks,
Kanwar


Cassandra backup

2013-02-18 Thread Kanwar Sangha
Hi - We have a req to store around 90 days of data per user. Last 7 days of 
data is going to be accessed frequently. Is there a way we can have the recent 
data (7 days) in SSD and the rest of the data in the
HDD ? Do we take a snapshot every 7 days and use a separate 'archive' cluster 
to serve the old data and a 'active' cluster to serve recent data ?

Any links/thoughts would be helpful.

Thanks,
Kanwar


RE: Mutation dropped

2013-02-18 Thread Kanwar Sangha
Thanks Aaron.

Does the rpc_timeout not control the client timeout ? Is there any param which 
is configurable to control the replication timeout between nodes ? Or the same 
param is used to control that since the other node is also like a client ?



From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: 17 February 2013 11:26
To: user@cassandra.apache.org
Subject: Re: Mutation dropped

You are hitting the maximum throughput on the cluster.

The messages are dropped because the node fails to start processing them before 
rpc_timeout.

However the request is still a success because the client requested CL was 
achieved.

Testing with RF 2 and CL 1 really just tests the disks on one local machine. 
Both nodes replicate each row, and writes are sent to each replica, so the only 
thing the client is waiting on is the local node to write to it's commit log.

Testing with (and running in prod) RF3 and CL QUROUM is a more real world 
scenario.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/02/2013, at 9:42 AM, Kanwar Sangha 
mailto:kan...@mavenir.com>> wrote:


Hi - Is there a parameter which can be tuned to prevent the mutations from 
being dropped ? Is this logic correct ?

Node A and B with RF=2, CL =1. Load balanced between the two.

--  Address   Load   Tokens  Owns (effective)  Host ID  
 Rack
UN  10.x.x.x   746.78 GB  256 100.0%
dbc9e539-f735-4b0b-8067-b97a85522a1a  rack1
UN  10.x.x.x   880.77 GB  256 100.0%
95d59054-be99-455f-90d1-f43981d3d778  rack1

Once we hit a very high TPS (around 50k/sec of inserts), the nodes start 
falling behind and we see the mutation dropped messages. But there are no 
failures on the client. Does that mean other node is not able to persist the 
replicated data ? Is there some timeout associated with replicated data 
persistence ?

Thanks,
Kanwar







From: Kanwar Sangha [mailto:kan...@mavenir.com<http://mavenir.com>]
Sent: 14 February 2013 09:08
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Mutation dropped

Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a 
lot of mutation dropped messages.  I understand that this is due to the replica 
not being written to the
other node ? RF = 2, CL =1.

>From the wiki -
For MUTATION messages this means that the mutation was not applied to all 
replicas it was sent to. The inconsistency will be repaired by Read Repair or 
Anti Entropy Repair

Thanks,
Kanwar




RE: Mutation dropped

2013-02-14 Thread Kanwar Sangha
Hi - Is there a parameter which can be tuned to prevent the mutations from 
being dropped ? Is this logic correct ?

Node A and B with RF=2, CL =1. Load balanced between the two.

--  Address   Load   Tokens  Owns (effective)  Host ID  
 Rack
UN  10.x.x.x   746.78 GB  256 100.0%
dbc9e539-f735-4b0b-8067-b97a85522a1a  rack1
UN  10.x.x.x   880.77 GB  256 100.0%
95d59054-be99-455f-90d1-f43981d3d778  rack1

Once we hit a very high TPS (around 50k/sec of inserts), the nodes start 
falling behind and we see the mutation dropped messages. But there are no 
failures on the client. Does that mean other node is not able to persist the 
replicated data ? Is there some timeout associated with replicated data 
persistence ?

Thanks,
Kanwar







From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 14 February 2013 09:08
To: user@cassandra.apache.org
Subject: Mutation dropped

Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a 
lot of mutation dropped messages.  I understand that this is due to the replica 
not being written to the
other node ? RF = 2, CL =1.

>From the wiki -
For MUTATION messages this means that the mutation was not applied to all 
replicas it was sent to. The inconsistency will be repaired by Read Repair or 
Anti Entropy Repair

Thanks,
Kanwar




Mutation dropped

2013-02-14 Thread Kanwar Sangha
Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a 
lot of mutation dropped messages.  I understand that this is due to the replica 
not being written to the
other node ? RF = 2, CL =1.

>From the wiki -
For MUTATION messages this means that the mutation was not applied to all 
replicas it was sent to. The inconsistency will be repaired by Read Repair or 
Anti Entropy Repair

Thanks,
Kanwar



Cassandra becnhmark

2013-02-11 Thread Kanwar Sangha
Hi - I am trying to do benchmark using the Cassandra-stress tool. They have 
given an example to insert data across 2 nodes -


/tools/stress/bin/stress -d 192.168.1.101,192.168.1.102 -n 1000
But when I run this across my 2 node cluster, I see the same keys in both 
nodes. Replication is not enabled. Should it not have unique keys in both nodes 
?

Thanks,
Kanwar





RE: DataModel Question

2013-02-06 Thread Kanwar Sangha
Thanks Aaron !

My use case is modeled like "skype" which stores IM + SMS + MMS in one 
conversation.

I need to have the following functionality -


*When I go offline and come online again, I need to retrieve all 
pending messages from all my conversations.

*I should be able to select a contact and view the 'history' of the 
messages (last 7 days, last 14 days, last 21 days...)

*If I log in to a different device, I should be able to synch at least 
a "few days" of messages.

*One conversation can have multiple participants.

*Support full synch or delta synch based on number of messages/history.

I guess this makes the data model span across many CFs ?




From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: 06 February 2013 22:20
To: user@cassandra.apache.org
Subject: Re: DataModel Question

2)  DynamicComposites : I read somewhere that they are not recommended ?
You probably wont need them.

Your current model will not sort message by the time they arrive in a day. The 
sort order will be based on Message type and the message ID.

I'm assuming you want to order messages, so put the time uuid at the start of 
the composite columns. If you often want to get the most recent messages use a 
reverse comparator.

You could probably also have wider rows if you want to, not sure how many 
messages kids send a day but you may get by with weekly partitions.

The CLI model could be:
row_key: 
column: 

You could also pack extra data used JSON, ProtoBuffers etc and store more that 
just the message in the column value.

If you use using CQL 3 consider this:

create table messages (
phone_numbertext,
day  timestamp,
message_sequence timeuuid, # your timestamp
message_id integer,
message_type text,
message_bodytext
) with PRIMARY KEY ( (phone_number, day), message_sequence, message_id)

(phone_number, day) is the partition key, same the thrift row key.

 message_sequence, message_id is the grouping columns, all instances will be 
grouped / ordered by these columns.

Hope that helps.



-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/02/2013, at 1:47 AM, Kanwar Sangha 
mailto:kan...@mavenir.com>> wrote:


1)  Version is 1.2
2)  DynamicComposites : I read somewhere that they are not recommended ?
3)  Good point. I need to think about that one.



From: Tamar Fraenkel [mailto:ta...@tok-media.com<http://tok-media.com>]
Sent: 06 February 2013 00:50
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: DataModel Question

Hi!
I have couple of questions regarding your model:
 1. What Cassandra version are you using? I am still working with 1.0 and this 
seems to make sense, but 1.2 gives you much more power I think.
 2. Maybe I don't understand your model, but I think you need  DynamicComposite 
columns, as user columns are different in number of components and maybe type.
 3. How do you associate between the SMS or MMS and the user you are chating 
with. Is it done by a separate CF?
Thanks,
Tamar


Tamar Fraenkel
Senior Software Engineer, TOK Media


ta...@tok-media.com<mailto:ta...@tok-media.com>
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956



On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra 
mailto:mishra.v...@gmail.com>> wrote:
Avoid super columns. If you need Sorted, wide rows then go for Composite 
columns.

-Vivek

On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha 
mailto:kan...@mavenir.com>> wrote:
Hi -  We are designing a Cassandra based storage for the following use cases-


*Store SMS messages

*Store MMS messages

*Store Chat history

What would be the ideal was to design the data model for this kind of 
application ? I am thinking on these lines ..

Row-Key :  Composite key [ PhoneNum : Day]


*Example:   19876543456:05022013

Dynamic Column Families


*Composite column key for SMS [SMS:MessageId:TimeUUID]

*Composite column key for MMS [MMS:MessageId:TimeUUID]

*Composite column key for user I am chatting with [UserId:198765432345] 
- This can have multiple values since each chat conv can have many messages. 
Should this be a super column ?


198:05022013

SMS::ttt

SMS:xxx12:ttt

MMS::ttt

:19

198:05022013









1987888:05022013











Thanks,
Kanwar






RE: DataModel Question

2013-02-06 Thread Kanwar Sangha
1)  Version is 1.2

2)  DynamicComposites : I read somewhere that they are not recommended ?

3)  Good point. I need to think about that one.



From: Tamar Fraenkel [mailto:ta...@tok-media.com]
Sent: 06 February 2013 00:50
To: user@cassandra.apache.org
Subject: Re: DataModel Question

Hi!
I have couple of questions regarding your model:
 1. What Cassandra version are you using? I am still working with 1.0 and this 
seems to make sense, but 1.2 gives you much more power I think.
 2. Maybe I don't understand your model, but I think you need  DynamicComposite 
columns, as user columns are different in number of components and maybe type.
 3. How do you associate between the SMS or MMS and the user you are chating 
with. Is it done by a separate CF?
Thanks,
Tamar


Tamar Fraenkel
Senior Software Engineer, TOK Media
[Inline image 1]

ta...@tok-media.com<mailto:ta...@tok-media.com>
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956



On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra 
mailto:mishra.v...@gmail.com>> wrote:
Avoid super columns. If you need Sorted, wide rows then go for Composite 
columns.

-Vivek

On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha 
mailto:kan...@mavenir.com>> wrote:
Hi -  We are designing a Cassandra based storage for the following use cases-


*Store SMS messages

*Store MMS messages

*Store Chat history

What would be the ideal was to design the data model for this kind of 
application ? I am thinking on these lines ..

Row-Key :  Composite key [ PhoneNum : Day]


*Example:   19876543456:05022013

Dynamic Column Families


*Composite column key for SMS [SMS:MessageId:TimeUUID]

*Composite column key for MMS [MMS:MessageId:TimeUUID]

*Composite column key for user I am chatting with [UserId:198765432345] 
- This can have multiple values since each chat conv can have many messages. 
Should this be a super column ?


198:05022013

SMS::ttt

SMS:xxx12:ttt

MMS::ttt

:19

198:05022013









1987888:05022013











Thanks,
Kanwar




<>

DataModel Question

2013-02-05 Thread Kanwar Sangha
Hi -  We are designing a Cassandra based storage for the following use cases-


*Store SMS messages

*Store MMS messages

*Store Chat history

What would be the ideal was to design the data model for this kind of 
application ? I am thinking on these lines ..

Row-Key :  Composite key [ PhoneNum : Day]


*Example:   19876543456:05022013

Dynamic Column Families


*Composite column key for SMS [SMS:MessageId:TimeUUID]

*Composite column key for MMS [MMS:MessageId:TimeUUID]

*Composite column key for user I am chatting with [UserId:198765432345] 
- This can have multiple values since each chat conv can have many messages. 
Should this be a super column ?


198:05022013

SMS::ttt

SMS:xxx12:ttt

MMS::ttt

:19

198:05022013









1987888:05022013











Thanks,
Kanwar




Index file

2013-02-02 Thread Kanwar Sangha
Hi - The index files created for the SSTables. Do they contain a sampling or 
the complete index ? Cassandra on startup loads these files based on the 
sampling rate in Cassandra.yaml ..right ?




BloomFilter

2013-02-02 Thread Kanwar Sangha
Hi - Couple of questions -



1) What is the ratio of the sstable file size to bloom filter size ? If i have 
a sstable of 1 GB, what is the approximate bloom filter size ? Assuming

0.000744 default val configured.



2) The bloom filters are stored in RAM but not in help from 1.2 onwards ?



3) What is the ratio of the RAM/Disk per node ?  What is the max disk size 
recommended for 1 node ? If I have 10 TB of data per node, how much RAM will 
the bloomfilter consume ?



Thanks,

kanwar