Re: C* data modeling for time series

2018-06-18 Thread mm

Hi,

we're currently evaluating KairosDB for time series which looks quite 
nice.

https://kairosdb.github.io/

The cool thing with KairosDB is that it uses Cassandra as storage engine 
and provide

additional features (mainly a REST-based API for accessing data).

Maybe you can take a look the schema definition kairos uses for 
cassandra and
check if it suits you. (Or use it directly as it stores data in 
cassandra anyway).


Greetings,
Michael

PS: Oh and GRAFANA has a kairosdb connector so you can test queries and 
create dashboards fast.



On 18.06.2018 09:46, Affan Syed wrote:

I have looked at this problem for a good year now. My feel is that
Cassandra alone as the sole underlying DB for Timeseries just does not
cut it.

I am starting to look at C* along with another DB for executing the
sort of queries we want here.

Currently I am evaluating Druid vs Kudu to be this supportive DB. Any
comments from community? Cassandra would more be for storage and
backup, while the data denormalization effort is taken care of by
another DB.

thank you

- Affan
On Thu, Jul 27, 2017 at 1:38 AM, CPC  wrote:


If all of your queries like this(i mean get all devices given a  a
time range) Hadoop would be more appropriate since those are
analytical queries.

Anyway, to query such data with spark Cassandra connector  your
partition key could include day and hash of your deviceid as pseudo
partition key column (could be abs(murmur(deviceid)%500) we add this
column to distribute data more evenly) . When you want query a time
range you should generate a rdd of tuple2 with all days that
intersect with that date and for each day your rdd should include
0..500 range. Like:

(20170726,0)
(20170726,1)
.
.
.
(20170726,499)

Then you should join this rdd with your table using
joinwithcassandratable method.

On Jul 26, 2017 4:41 PM, "Junaid Nasir"  wrote:

all devices.
After selecting the data I group them and perform other actions i.e
sum, avg on fields and then display those to compare how devices are
doing compared to each other.

On Wed, Jul 26, 2017 at 5:32 PM, CPC  wrote:

Hi Junaid,

Given a time range do you want to take all devices or a specific
device?

On Jul 26, 2017 3:15 PM, "Junaid Nasir"  wrote:

I have a C* cluster (3 nodes) with some 60gb data (replication
factor 2). when I started using C* coming from SQL background didn't
give much thought about modeling the data correctly. so what I did
was

CREATE TABLE data ( deviceId int,
time timestamp,
field1 text,
filed2 text,
field3 text,
PRIMARY KEY(deviceId, time)) WITH CLUSTERING
ORDER BY (time ASC);

but most of the queries I run (using spark and datastax connector)
compares data of different devices for some time period. for example

SELECT * FROM data WHERE time > '2017-07-01 12:00:00';

from my understanding this runs a full table scan. as shown in spark
UI (from DAG visualization "Scan
org.apache.spark.sql.cassandra.CassandraSourceRelation@32bb7d65")
meaning C* will read all the data and then filter for time. Spark
jobs runs for hours even for smaller time frames.

what is the right approach for data modeling for such queries?. I
want to get a general idea of things to look for when modeling such
data.
really appreciate all the help from this community :). if you need
any extra details please ask me here.

Regards,
Junaid



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Single Host: Fix "Unknown CF" issue

2018-06-07 Thread mm

Hi,

we will follow the recommendation not to use materialized views.

Thanks a lot to both of you!
You helped me a lot.

Oh and besides: We are also using the lagom framework :) So we will also 
be able to regenerate a Read-Side if have to.


greetings,
Michael


On 07.06.2018 13:45, Evelyn Smith wrote:

Hey Michael,

In the case that you have a production cluster set up with multiple
nodes, assuming you have rf>1 it’s easier to just replace the broken
node and restore it’s data. (For future reference)

I wasn’t sure if view was referring to materialised view at the time
although Pradeeps comment along with your own suggest it might (I
didn’t get a chance to look through the code to confirm if view was
MV or something else and I’m not that familiar with the code base).

As far as the choice of using Materialised Views, they aren’t being
deprecated they are currently marked as experimental and most people
strongly advise you to not use them. If you can avoid it don’t do
it. They’re associated with a lot of bugs and scalability issues.
Also they’re just hard to do right if you aren’t exceptionally
familiar with Cassandra.

Regards,
Evelyn.


On 7 Jun 2018, at 3:05 am, Pradeep Chhetri 
wrote:

Hi Michael,

We have faced the same situation as yours in our production
environment where we suddenly got "Unknown CF Exception" for
materialized views too. We are using Lagom apps with cassandra for
persistence. In our case, since these views can be regenerated from
the original events, we were able to safely recover.

Few suggestions from my operations experience:

1) Upgrade your cassandra cluster to 3.11.2 because there are lots
of bug fixes specific to materialized views.
2) Never let your application create/update/delete cassandra
table/materialized views. Always create them manually to make sure
that only connection is doing the operation.

Regards,
Pradeep

On Wed, Jun 6, 2018 at 9:44 PM,  wrote:
Hi Evelyn,

thanks a lot for your detailed response message.

The data is not important. We've already wiped the data and created
a new cassandra installation. The data re-import task is already
running. We've lost the data for a couple of months but in this case
this does not matter.

Nevertheless we will try what you told us - just to be
smarter/faster if this happens in production (where we will setup a
cassandra cluster with multiple cassandra nodes anyway). I will drop
you a note when we are done.

Hmmm... the problem is within a "View". Are this the materialized
views?

I'm asking this because:
* Someone on the internet (stackoverflow if a recall correctly)
mentioned that using materialized views are to be deprecated.
* I had been on a datastax workshop in Zurich a couple of days ago
where a datastax employee told me that we should not use
materialized views - it is better to create & fill all tables
directly.

Would you also recommend not to use materialized views? As this
problem is related to a view - maybe we could avoid this problem
simply by following this recommendation.

Thanks a lot again!

Greetings,
Michael

On 06.06.2018 16:48, Evelyn Smith wrote:
Hi Michael,

So I looked at the code, here are some stages of your error message:
1. at


org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292)

[apache-cassandra-3.11.0.jar:3.11.0
At this step Cassandra is running through the keyspaces in it’s
schema turning off compactions for all tables before it starts
rerunning the commit log (so it isn’t an issue with the commit
log).
2. at org.apache.cassandra.db.Keyspace.open(Keyspace.java:127)
~[apache-cassandra-3.11.0.jar:3.11.0]
Loading key space related to the column family that is erroring out
3. at org.apache.cassandra.db.Keyspace.(Keyspace.java:324)
~[apache-cassandra-3.11.0.jar:3.11.0]
Cassandra has initialised the column family and is reloading the
view
4. at


org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:204)

~[apache-cassandra-3.11.0.jar:3.11.0]
At this point I haven’t had enough time to tell if Cassandra is
requesting info on a column specifically or still requesting
information on a column family. Regardless, given we already rule
out
issues with the SSTables and their directory and Cassandra is yet to
start processing the commit log this to me suggests it’s something
wrong in one of the system keyspaces storing the schema information.

There should definitely be a way to resolve this with zero data loss
by either:
1. Fixing the issue in the system keyspace SSTables (hard)
2. Rerunning the commit log on a new Cassandra node that has been
restored from the current one (I’m not sure if this is possible
but
I’ll figure it out tomorrow)

The alternative is if you are ok with losing the commitlog then you
can backup the data and restore it to a new node (or the same node
but
with everything blown away). This isn’t a trivial process though
I’ve done it a few times.

How important is the data?

Happy to come back to this tomorrow (need some sleep)

Regards,

Re: Single Host: Fix "Unknown CF" issue

2018-06-06 Thread mm

Hi Evelyn,

thanks a lot for your detailed response message.

The data is not important. We've already wiped the data and created a 
new cassandra installation. The data re-import task is already running. 
We've lost the data for a couple of months but in this case this does 
not matter.


Nevertheless we will try what you told us - just to be smarter/faster if 
this happens in production (where we will setup a cassandra cluster with 
multiple cassandra nodes anyway). I will drop you a note when we are 
done.


Hmmm... the problem is within a "View". Are this the materialized views?

I'm asking this because:
* Someone on the internet (stackoverflow if a recall correctly) 
mentioned that using materialized views are to be deprecated.
* I had been on a datastax workshop in Zurich a couple of days ago where 
a datastax employee told me that we should not use materialized views - 
it is better to create & fill all tables directly.


Would you also recommend not to use materialized views? As this problem 
is related to a view - maybe we could avoid this problem simply by 
following this recommendation.


Thanks a lot again!

Greetings,
Michael



On 06.06.2018 16:48, Evelyn Smith wrote:

Hi Michael,

So I looked at the code, here are some stages of your error message:
1. at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292)
[apache-cassandra-3.11.0.jar:3.11.0
 At this step Cassandra is running through the keyspaces in it’s
schema turning off compactions for all tables before it starts
rerunning the commit log (so it isn’t an issue with the commit log).
2. at org.apache.cassandra.db.Keyspace.open(Keyspace.java:127)
~[apache-cassandra-3.11.0.jar:3.11.0]
 Loading key space related to the column family that is erroring out
3. at org.apache.cassandra.db.Keyspace.(Keyspace.java:324)
~[apache-cassandra-3.11.0.jar:3.11.0]
 Cassandra has initialised the column family and is reloading the view
4. at
org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:204)
~[apache-cassandra-3.11.0.jar:3.11.0]
 At this point I haven’t had enough time to tell if Cassandra is
requesting info on a column specifically or still requesting
information on a column family. Regardless, given we already rule out
issues with the SSTables and their directory and Cassandra is yet to
start processing the commit log this to me suggests it’s something
wrong in one of the system keyspaces storing the schema information.

There should definitely be a way to resolve this with zero data loss
by either:
1. Fixing the issue in the system keyspace SSTables (hard)
2. Rerunning the commit log on a new Cassandra node that has been
restored from the current one (I’m not sure if this is possible but
I’ll figure it out tomorrow)

The alternative is if you are ok with losing the commitlog then you
can backup the data and restore it to a new node (or the same node but
with everything blown away). This isn’t a trivial process though
I’ve done it a few times.

How important is the data?

Happy to come back to this tomorrow (need some sleep)

Regards,
Eevee.


On 5 Jun 2018, at 7:32 pm, m...@vis.at wrote:
Keyspace.getColumnFamilyStore



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Single Host: Fix "Unknown CF" issue

2018-06-05 Thread mm

Hello Evelyn,

if we do what you say cassandra creates a new, empty column family 
directory on restart and the stops with the same exception.


One point i forgot to mention is that we've still got data in the commit 
log. (99.9% for other keyspaces).
Are there any tools which we can use to examine the sstable or the 
commitlog files to dig further into this issue?


Or:
Can we setup a new cassandra installation and copy the data of the other 
keyspacs to the new instance?


The column family which shows this error does not contain important 
data. We're just very uncertain now what to do if this happens with 
important data.
(ok - maybe this problem is not that important if we use a cluster of 
e.g. 3 nodes and a replication factor of 2 - but it leaves a strange 
impression about cassandra anyway).


greetings,
Michael


On 05.06.2018 15:31, Evelyn Smith wrote:

Hey Michael,

I have a hunch.

If the system doesn’t recognise the column family which is stopping
the node from starting perhaps try copying the column family directory
to a backup then deleting it.

Then restart Cassandra. If it starts I’ll assume the schema didn’t
have the column family:
* Create the column family again (be careful to create it exactly how
it was in the original schema);
* Stop Cassandra again;
* Move the SSTables from the column family backup into the new column
family folder (you have to do this as the column family folder will
have a UUID in it’s name that will have changed); and
* Restart Cassandra.
You should now have Cassandra running without losing your data.

If Cassandra doesn’t restart after deleting the column family
directory then just restore it from the backup and you are back to
square one.

Regards,
Evelyn.


On 5 Jun 2018, at 7:32 pm, m...@vis.at wrote:

Hi all!

We're using cassandra since a couple of month to get familiar with it. 
We're currently using only 1-node. Yesterday our server had to be 
restarted and now cassandra does not start anymore.


It reports:
INFO  [main] 2018-06-05 09:50:43,030 ColumnFamilyStore.java:406 - 
Initializing system_schema.indexes
INFO  [main] 2018-06-05 09:50:43,036 ViewManager.java:137 - Not 
submitting build tasks for views in keyspace system_schema as storage 
service is not initialized
INFO  [main] 2018-06-05 09:50:43,283 ColumnFamilyStore.java:406 - 
Initializing system_traces.events
INFO  [main] 2018-06-05 09:50:43,286 ColumnFamilyStore.java:406 - 
Initializing system_traces.sessions
INFO  [main] 2018-06-05 09:50:43,287 ViewManager.java:137 - Not 
submitting build tasks for views in keyspace system_traces as storage 
service is not initialized
INFO  [main] 2018-06-05 09:50:43,300 ColumnFamilyStore.java:406 - 
Initializing m2m_auth.user
INFO  [main] 2018-06-05 09:50:43,302 ColumnFamilyStore.java:406 - 
Initializing m2m_auth.eventsbytag1
INFO  [main] 2018-06-05 09:50:43,306 ColumnFamilyStore.java:406 - 
Initializing m2m_auth.mail2user
ERROR [main] 2018-06-05 09:50:43,311 CassandraDaemon.java:706 - 
Exception encountered during startup
java.lang.IllegalArgumentException: Unknown CF 
0f6c8b36-5f34-11e8-a476-c93745f84272
   at 
org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:204) 
~[apache-cassandra-3.11.0.jar:3.11.0]
   at 
org.apache.cassandra.db.view.ViewManager.addView(ViewManager.java:152) 
~[apache-cassandra-3.11.0.jar:3.11.0]
   at 
org.apache.cassandra.db.view.ViewManager.reload(ViewManager.java:125) 
~[apache-cassandra-3.11.0.jar:3.11.0]
   at org.apache.cassandra.db.Keyspace.(Keyspace.java:324) 
~[apache-cassandra-3.11.0.jar:3.11.0]
   at org.apache.cassandra.db.Keyspace.open(Keyspace.java:127) 
~[apache-cassandra-3.11.0.jar:3.11.0]
   at org.apache.cassandra.db.Keyspace.open(Keyspace.java:104) 
~[apache-cassandra-3.11.0.jar:3.11.0]
   at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
[apache-cassandra-3.11.0.jar:3.11.0]
   at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600) 
[apache-cassandra-3.11.0.jar:3.11.0]
   at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) 
[apache-cassandra-3.11.0.jar:3.11.0]


We do not know how to bring it up again and e.g. stackoverflow only 
recommends to delete the node and let it rebuild (which we cannot do 
as we've only 1 node currently).

Can anybody drop us a hint ?

What can we do if our node does not start?

We've found an entry in the data directory with the given UUID but at 
the time when we restarted the node the whole keyspace was idle for a 
couple of hours.
With the problem we're now _very_ concerned about data safety when we 
read that it can be a problem if e.g. tables are 
created/deleted/updated concurrently. But in our case we did not 
create/update tables concurrently and got this problem anyway


Thanks for any help!

greetings,
Michael



-
To unsubscribe, e-mail: 

Single Host: Fix "Unknown CF" issue

2018-06-05 Thread mm

Hi all!

We're using cassandra since a couple of month to get familiar with it. 
We're currently using only 1-node. Yesterday our server had to be 
restarted and now cassandra does not start anymore.


It reports:
INFO  [main] 2018-06-05 09:50:43,030 ColumnFamilyStore.java:406 - 
Initializing system_schema.indexes
INFO  [main] 2018-06-05 09:50:43,036 ViewManager.java:137 - Not 
submitting build tasks for views in keyspace system_schema as storage 
service is not initialized
INFO  [main] 2018-06-05 09:50:43,283 ColumnFamilyStore.java:406 - 
Initializing system_traces.events
INFO  [main] 2018-06-05 09:50:43,286 ColumnFamilyStore.java:406 - 
Initializing system_traces.sessions
INFO  [main] 2018-06-05 09:50:43,287 ViewManager.java:137 - Not 
submitting build tasks for views in keyspace system_traces as storage 
service is not initialized
INFO  [main] 2018-06-05 09:50:43,300 ColumnFamilyStore.java:406 - 
Initializing m2m_auth.user
INFO  [main] 2018-06-05 09:50:43,302 ColumnFamilyStore.java:406 - 
Initializing m2m_auth.eventsbytag1
INFO  [main] 2018-06-05 09:50:43,306 ColumnFamilyStore.java:406 - 
Initializing m2m_auth.mail2user
ERROR [main] 2018-06-05 09:50:43,311 CassandraDaemon.java:706 - 
Exception encountered during startup
java.lang.IllegalArgumentException: Unknown CF 
0f6c8b36-5f34-11e8-a476-c93745f84272
at 
org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:204) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.db.view.ViewManager.addView(ViewManager.java:152) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.db.view.ViewManager.reload(ViewManager.java:125) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at org.apache.cassandra.db.Keyspace.(Keyspace.java:324) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:127) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:104) 
~[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:292) 
[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600) 
[apache-cassandra-3.11.0.jar:3.11.0]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) 
[apache-cassandra-3.11.0.jar:3.11.0]


We do not know how to bring it up again and e.g. stackoverflow only 
recommends to delete the node and let it rebuild (which we cannot do as 
we've only 1 node currently).

Can anybody drop us a hint ?

What can we do if our node does not start?

We've found an entry in the data directory with the given UUID but at 
the time when we restarted the node the whole keyspace was idle for a 
couple of hours.
With the problem we're now _very_ concerned about data safety when we 
read that it can be a problem if e.g. tables are created/deleted/updated 
concurrently. But in our case we did not create/update tables 
concurrently and got this problem anyway


Thanks for any help!

greetings,
Michael



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org