Re: Question about compaction strategy changes

2016-10-23 Thread kurt Greaves
​More compactions meaning "actual number of compaction tasks". A compaction task generally operates on many SSTables (how many depends on the chosen compaction strategy). The number of pending tasks does not line up with the number of SSTables that will be compacted. 1 task may compact many

CommitLogReadHandler$CommitLogReadException: Unexpected error deserializing mutation

2016-10-23 Thread Ali Akhtar
I have a single node cassandra installation on my dev laptop, which is used just for dev / testing. Recently, whenever I restart my laptop, Cassandra fails to start when I run it via 'sudo service cassandra start'. Doing a tail on /var/log/cassandra/system.log gives this log: *INFO [main]

Re: Question about compaction strategy changes

2016-10-23 Thread Seth Edwards
More compactions meaning "rows to be compacted" or actual number of pending compactions? I assumed when I run nodetool compactionstats the number of pending tasks would line up with number of sstables that will be compacted. Most of the time this is idle, then we hit spots when it could jump into

Re: Question about compaction strategy changes

2016-10-23 Thread kurt Greaves
On 22 October 2016 at 03:37, Seth Edwards wrote: > We're using TWCS and we notice that if we make changes to the options to > the window unit or size, it seems to implicitly start recompacting all > sstables. If you increase the window unit or size you potentially increase the

Re: failure node rejoin

2016-10-23 Thread Ben Slater
Definitely sounds to me like something is not working as expected but I don’t really have any idea what would cause that (other than the fairly extreme failure scenario). A couple of things I can think of to try to narrow it down: 1) Run nodetool flush on all nodes after step 2 - that will make

Re: Speeding up schema generation during tests

2016-10-23 Thread horschi
You have to manually do "nodetool flush && nodetool flush system" before shutdown, otherwise Cassandra might break. With that it is working nicely. On Sun, Oct 23, 2016 at 3:40 PM, Ali Akhtar wrote: > I'm using https://github.com/jsevellec/cassandra-unit and haven't come >

Re: is there any problem having too many clustering columns?

2016-10-23 Thread Kant Kodali
That helps! thanks! I assume you meant "updating one the columns in the PRIMARY KEY would require DELETE + INSERT". since we don't do updates or deletes on this table I believe could leverage this! On Sun, Oct 23, 2016 at 12:44 PM, DuyHai Doan wrote: > There is nothing

Re: is there any problem having too many clustering columns?

2016-10-23 Thread DuyHai Doan
There is nothing wrong with your schema, but just remember that because to set everything except one as clustering columns, updating them is no longer possible. To "update" the value of one of those columns you'll need to do a DELETE + INSERT. Example: with normal schema: UPDATE hello SET e =

is there any problem having too many clustering columns?

2016-10-23 Thread Kant Kodali
Hi All, Is there any problem having too many clustering columns? My goal is to store data by columns in order and for any given partition (primary key) each of its non-clustering column (columns that are not part of primary key) can lead to a new column underneath or the CQL equivalent would be a

Re: Speeding up schema generation during tests

2016-10-23 Thread Ali Akhtar
I'm using https://github.com/jsevellec/cassandra-unit and haven't come across any race issues or problems. Cassandra-unit takes care of creating the schema before it runs the tests. On Sun, Oct 23, 2016 at 6:17 PM, DuyHai Doan wrote: > Ok I have added

Re: Speeding up schema generation during tests

2016-10-23 Thread DuyHai Doan
Ok I have added -Dcassandra.unsafesystem=true and my tests are broken. The reason is that I create some schemas before executing tests. When unable unsafesystem, Cassandra does not block for schema flush so you man run into race conditions where the test start using the created schema but it has

Re: Cannot restrict clustering columns by IN relations when a collection is selected by the query

2016-10-23 Thread Samba
please see CASSANDRA-12654 On Sat, Oct 22, 2016 at 3:12 AM, DuyHai Doan wrote: > So the commit on this restriction dates back to 2.2.0 (CASSANDRA-7981). > > Maybe Benjamin Lerer can shed some light on it. > > On Fri, Oct 21, 2016 at 11:05 PM, Jeff Carpenter < >

Re: Hadoop vs Cassandra

2016-10-23 Thread Welly Tambunan
Another thing is, Let's say that we already have a structure data, the way we load that to HDFS is to turn that one into a files ? Cheers On Sun, Oct 23, 2016 at 6:18 PM, Welly Tambunan wrote: > So basically you will store that files to HDFS and use Spark to process it > ?

Re: Hadoop vs Cassandra

2016-10-23 Thread Welly Tambunan
So basically you will store that files to HDFS and use Spark to process it ? On Sun, Oct 23, 2016 at 6:03 PM, Joaquin Alzola wrote: > > > I think what Ali mentions is correct: > > If you need a lot of queries that require joins, or complex analytics of > the kind that

RE: Hadoop vs Cassandra

2016-10-23 Thread Joaquin Alzola
I think what Ali mentions is correct: If you need a lot of queries that require joins, or complex analytics of the kind that Cassandra isn't suited for, then HDFS / HBase may be better. We have files in which one line contains 500 fields (separated by pipe) and each of this fields is

Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar
"from a particular query" should be " from a particular country" On Sun, Oct 23, 2016 at 2:36 PM, Ali Akhtar wrote: > They can be, but I would assume that if your Cassandra data model is > inefficient for the kind of queries you want to do, Spark won't magically > take

Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar
They can be, but I would assume that if your Cassandra data model is inefficient for the kind of queries you want to do, Spark won't magically take that way. For example, say you have a users table. Each user has a country, which isn't a partitioning key or clustering key. If you wanted to

Re: Hadoop vs Cassandra

2016-10-23 Thread Welly Tambunan
I like muti data centre resillience in cassandra. I think thats plus one for cassandra. Ali, complex analytics can be done in spark right? On 23 Oct 2016 4:08 p.m., "Ali Akhtar" wrote: > > I would say it depends on your use case. > > If you need a lot of queries that

Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar
I would say it depends on your use case. If you need a lot of queries that require joins, or complex analytics of the kind that Cassandra isn't suited for, then HDFS / HBase may be better. If you can work with the cassandra way of doing things (creating new tables for each query you'll need to

Re: Hadoop vs Cassandra

2016-10-23 Thread Ben Slater
It’s reasonably common to use Cassandra to cover both online and analytics requirements, particularly using it in conjunction with Spark. You can use Cassandra’s multi-DC functionality to have online and analytics DCs for a reasonable degree of workload separation without having to build ETL (or

Re: Hadoop vs Cassandra

2016-10-23 Thread Welly Tambunan
I mean. HDFS and HBase. On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar wrote: > By Hadoop do you mean HDFS? > > > > On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan wrote: > >> Hi All, >> >> I read the following comparison between hadoop and cassandra.

Re: Hadoop vs Cassandra

2016-10-23 Thread Ali Akhtar
By Hadoop do you mean HDFS? On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan wrote: > Hi All, > > I read the following comparison between hadoop and cassandra. Seems the > conclusion that we use hadoop for data lake ( cold data ) and Cassandra for > hot data (real time

Hadoop vs Cassandra

2016-10-23 Thread Welly Tambunan
Hi All, I read the following comparison between hadoop and cassandra. Seems the conclusion that we use hadoop for data lake ( cold data ) and Cassandra for hot data (real time data). http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop My question is, can we just use cassandra to

Re: What is the maximum value of Cassandra Counter Column?

2016-10-23 Thread DuyHai Doan
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/serializers/CounterSerializer.java public class CounterSerializer extends LongSerializer On Sun, Oct 23, 2016 at 10:16 AM, Ben Slater wrote: >

Re: What is the maximum value of Cassandra Counter Column?

2016-10-23 Thread Ben Slater
http://cassandra.apache.org/doc/latest/cql/types.html?highlight=counter#counters On Sun, 23 Oct 2016 at 19:15 Kant Kodali wrote: > where does it say counter is implemented as long? > > On Sun, Oct 23, 2016 at 1:13 AM, Ali Akhtar wrote: > > Probably: >

Re: What is the maximum value of Cassandra Counter Column?

2016-10-23 Thread Ali Akhtar
It seems obvious. On Sun, Oct 23, 2016 at 1:15 PM, Kant Kodali wrote: > where does it say counter is implemented as long? > > On Sun, Oct 23, 2016 at 1:13 AM, Ali Akhtar wrote: > >> Probably: https://docs.oracle.com/javase/8/docs/api/java/lan >>

Re: What is the maximum value of Cassandra Counter Column?

2016-10-23 Thread Kant Kodali
where does it say counter is implemented as long? On Sun, Oct 23, 2016 at 1:13 AM, Ali Akhtar wrote: > Probably: https://docs.oracle.com/javase/8/docs/api/java/ > lang/Long.html#MAX_VALUE > > On Sun, Oct 23, 2016 at 1:12 PM, Kant Kodali wrote: > >> What

Re: What is the maximum value of Cassandra Counter Column?

2016-10-23 Thread Ali Akhtar
Probably: https://docs.oracle.com/javase/8/docs/api/java/lang/Long.html#MAX_VALUE On Sun, Oct 23, 2016 at 1:12 PM, Kant Kodali wrote: > What is the maximum value of Cassandra Counter Column? >

What is the maximum value of Cassandra Counter Column?

2016-10-23 Thread Kant Kodali
What is the maximum value of Cassandra Counter Column?