Re: Hotspots on Time Series based Model

2015-11-17 Thread DuyHai Doan
"Will the partition on PRIMARY KEY ((YEAR, MONTH, DAY, HOUR) cause any hotspot issues on a node given the hourly data size is ~13MB ?" 13MB/partition is quite small, you should be fine. One thing to be careful is the memtable flush frequency and appropriate compaction tuning to avoid having one

Re: handling down node cassandra 2.0.15

2015-11-17 Thread Anuj Wadehra
Only if gc_grace_seconds havent passed since the failure. If your machine is down for more than gc_grace_seconds you need to delete the data directory and go with auto bootstrap = true . Thanks Anuj Sent from Yahoo Mail on Android From:"Anishek Agarwal" Date:Tue, 17 Nov,

Re: Hotspots on Time Series based Model

2015-11-17 Thread Jack Krupansky
I'd be more comfortable keeping partition size below 10MB, but the more critical factor is the write rate. In a technical sense a single node (and its replicas) and a single partition will be a hotspot since all writes for an extended period of time will go to that single node and partition (for

Re: Ingesting Large Number of files

2015-11-17 Thread areddyraja
This is about 5gb one time. Network speed lets us say 200mb/sec Let us say you have 10 node cluster. Choose your partition key such a way that, it can write on all nodes. That means about 0.5gb per node. With 200 mb/sec network speed, 500 mb takes 500*8/200 would give 20 secs total time for

Re: Hotspots on Time Series based Model

2015-11-17 Thread Yuri Shkuro
You can also subdivide hourly partition further by adding an artificial "bucket" field to the partition key, which you populate with a random number say between 0 and 10. When you query, you fan out 10 queries, one for each bucket, and you need to do a manual merge of the resilts. This way you pay

Re: Hotspots on Time Series based Model

2015-11-17 Thread areddyraja
13mb seems to very fine in our exp. we have keys that could take more than 100 mb Sent from my iPhone > On 17-Nov-2015, at 7:47 PM, Yuri Shkuro wrote: > > You can also subdivide hourly partition further by adding an artificial > "bucket" field to the partition key, which you

Ingesting Large Number of files

2015-11-17 Thread Tushar Agrawal
We get periodic bulk load (twice a month) in form of delimited data files. We get about 10K files with average size of 50 MB. Each record is a row in Cassandra table. What is the best way to ingest data into cassandra in fastest possible way? Thank you, Tushar

Re: Help diagnosing performance issue

2015-11-17 Thread Antoine Bonavita
Hello, As I have not heard from anybody on the list, I guess I did not provide the right kind of information or I did not ask the right question. The things I forgot to mention in my previous email: * Checked the logs without noticing anything out of the ordinary. Memtables flushes occur

unsubscribe

2015-11-17 Thread Johan Sandström

Re: Nodetool rebuild on vnodes enabled

2015-11-17 Thread Robert Coli
On Tue, Nov 17, 2015 at 3:24 PM, cass savy wrote: > I am exploring vnodes on DSE spark enabled DC. I added new nodes with 64 > vnodes, stream thruput 100mb instead of default 200mb, sokcet_timeout set > to 1hr. > 1) what version of Cassandra (please the version of Apache

Re: Help diagnosing performance issue

2015-11-17 Thread Sebastian Estevez
Hi, You're sstables are probably falling out of page cache on the smaller nodes and your slow disks are killing your latencies. Check to see if this is the case with pcstat: https://github.com/tobert/pcstat All the best, [image: datastax_logo.png] Sebastián

Re: Help diagnosing performance issue

2015-11-17 Thread Robert Coli
On Tue, Nov 17, 2015 at 11:08 AM, Sebastian Estevez < sebastian.este...@datastax.com> wrote: > You're sstables are probably falling out of page cache on the smaller > nodes and your slow disks are killing your latencies. > +1 most likely. Are the heaps the same size on both machines? =Rob

Re: Ingesting Large Number of files

2015-11-17 Thread Robert Coli
On Tue, Nov 17, 2015 at 6:32 AM, Tushar Agrawal wrote: > We get periodic bulk load (twice a month) in form of delimited data files. > We get about 10K files with average size of 50 MB. Each record is a row in > Cassandra table. >

Re: Repair Hangs while requesting Merkle Trees

2015-11-17 Thread Anuj Wadehra
Thanks Bryan !! Connection is in ESTBLISHED state on on end and completely missing at other end (in another dc). Yes, we can revisit TCP tuning.But the problem is node specific. So not sure whether tuning is the culprit. Thanks Anuj Sent from Yahoo Mail on Android From:"Bryan Cheng"

Re: which astyanax version to use?

2015-11-17 Thread Lijun Huang
Thank you Minh, So it means if I want to use Cassandra 2.1+, any version of Astyanax cannot be compatible with it? Because we are already using the Astyanax, it maybe a heavy work to change from Astyanax to Datastax Java Driver. On Wed, Nov 18, 2015 at 11:52 AM, Minh Do

Re: which astyanax version to use?

2015-11-17 Thread Minh Do
The latest version of Astyanax won't work with Cassandra 2.1+. So you are better off using Java Driver from Datastax. /Minh On Tue, Nov 17, 2015 at 7:29 PM, Lijun Huang wrote: > Hi All, > > I have the similar problem, if I use the Cassandra 2.1 version, which > Astyanax

Re: which astyanax version to use?

2015-11-17 Thread Lijun Huang
Hi All, I have the similar problem, if I use the Cassandra 2.1 version, which Astyanax version is the best one for me? For the versions in Astyanax Github pages make me a little confused, I need some experience about this. Thanks in advance. Thanks, Lijun Huang --

Hotspots on Time Series based Model

2015-11-17 Thread Chandra Sekar KR
Hi, I have a time-series based table with the below structure and partition size/volumetrics. The purpose of this table is to enable range based scans on log_ts and filter the log_id, so it can be further used in the main table (EVENT_LOG) for checking the actual data. The EVENT_LOG_BY_DATE

Re: Devcenter & C* 3.0 Connection Error.

2015-11-17 Thread Alexandre Dutra
Hello, Unfortunately, even with their most recent versions, both the Java driver and DevCenter are incompatible with C* 3.0. Both teams are working actively to release compatible versions in the next days. Regards, Alexandre Dutra On Tue, Nov 17, 2015 at 12:16 AM Michael Shuler

Re: handling down node cassandra 2.0.15

2015-11-17 Thread Robert Coli
On Tue, Nov 17, 2015 at 4:33 AM, Anuj Wadehra wrote: > Only if gc_grace_seconds havent passed since the failure. If your machine > is down for more than gc_grace_seconds you need to delete the data > directory and go with auto bootstrap = true . > Since CASSANDRA-6961