Re: Slender Cassandra Cluster Project

2018-01-31 Thread Michael Mior
While whatever format this comes out in would be helpful, you might want to
consider Terraform. 1Password recently published a blog post on their
experience with Terraform vs. CloudFormation.

https://blog.agilebits.com/2018/01/25/terraforming-1password/

--
Michael Mior
mm...@apache.org

2018-01-31 2:34 GMT-05:00 Kenneth Brotman <kenbrot...@yahoo.com.invalid>:

> Hi Yuri,
>
> If possible I will do everything with AWS Cloudformation.  I'm working on
> it now.  Nothing published yet.
>
> Kenneth Brotman
>
> -Original Message-
> From: Yuri Subach [mailto:ysub...@gmail.com]
> Sent: Tuesday, January 30, 2018 7:02 PM
> To: user@cassandra.apache.org
> Subject: RE: Slender Cassandra Cluster Project
>
> Hi Kenneth,
>
> I like this project idea!
>
> A couple of questions:
> - What tools are you going to use for AWS cluster setup?
> - Do you have anything published already (github)?
>
> On 2018-01-22 22:42:11, Kenneth Brotman <kenbrot...@yahoo.com.INVALID>
> wrote:
> > Thanks Anthony!  I’ve made a note to include that information in the
> documentation. You’re right.  It won’t work as intended unless that is
> configured properly.
> >
> >
> >
> > I’m also favoring a couple other guidelines for Slender Cassandra:
> >
> > 1.   SSD’s only, no spinning disks
> >
> > 2.   At least two cores per node
> >
> >
> >
> > For AWS, I’m favoring the c3.large on Linux.  It’s available in these
> regions: US-East, US-West and US-West2.  The specifications are listed as:
> >
> > · Two (2) vCPU’s
> >
> > · 3.7 Gib Memory
> >
> > · Two (2) 16 GB SSD’s
> >
> > · Moderate I/O
> >
> >
> >
> > It’s going to be hard to beat the inexpensive cost of operating a
> Slender Cluster on demand in the cloud – and it fits a lot of the use cases
> well:
> >
> >
> >
> > · For under a $100 a month, in current pricing for EC2
> instances, you can operate an eighteen (18) node Slender Cluster for five
> (5) hours a day, ten (10) days a month.  That’s fine for demonstrations,
> teaching or experiments that last half a day or less.
> >
> > · For under $20, you can have that Slender Cluster up all day
> long, up to ten (10) hours, for whatever demonstrations or experiments you
> want it for.
> >
> >
> >
> > As always, feedback is encouraged.
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Kenneth Brotman
> >
> >
> >
> > From: Anthony Grasso [mailto:anthony.gra...@gmail.com]
> > Sent: Sunday, January 21, 2018 3:57 PM
> > To: user
> > Subject: Re: Slender Cassandra Cluster Project
> >
> >
> >
> > Hi Kenneth,
> >
> >
> >
> > Fantastic idea!
> >
> >
> >
> > One thing that came to mind from my reading of the proposed setup was
> rack awareness of each node. Given that the proposed setup contains three
> DCs, I assume that each node will be made rack aware? If not, consider
> defining three racks for each DC and placing two nodes in each rack. This
> will ensure that all the nodes in a single rack contain at most one replica
> of the data.
> >
> >
> >
> > Regards,
> >
> > Anthony
> >
> >
> >
> > On 17 January 2018 at 11:24, Kenneth Brotman
> <kenbrot...@yahoo.com.invalid> wrote:
> >
> > Sure.  That takes the project from awesome to 10X awesome.  I absolutely
> would be willing to do that.  Thanks Kurt!
> >
> >
> >
> > Regarding your comment on the keyspaces, I agree.  There should be a few
> simple examples one way or the other that can be duplicated and observed,
> and then an example to duplicate and play with that has a nice real world
> mix, with some keyspaces that replicate over only a subset of DC’s and some
> that replicate to all DC’s.
> >
> >
> >
> > Kenneth Brotman
> >
> >
> >
> > From: kurt greaves [mailto:k...@instaclustr.com]
> > Sent: Tuesday, January 16, 2018 1:31 PM
> > To: User
> > Subject: Re: Slender Cassandra Cluster Project
> >
> >
> >
> > Sounds like a great idea. Probably would be valuable to add to the
> official docs as an example set up if you're willing.
> >
> >
> >
> > Only thing I'd add is that you should have keyspaces that replicate over
> only a subset of DC's, plus one/some replicated to all DC's
> >
> >
> >
> > On 17 Jan. 2018 03:26, "Kenneth Brotman" <kenbrot...@yahoo.com.invalid>
> wrote:
> >
> 

Re: Question about materialized view

2017-06-26 Thread Michael Mior
This is handled by updateAffectsView in org.apache.cassandra.db.view.View.
It will scan over each row to be updated in the base table and see that the
column is not included in the view definition and skip the update.

--
Michael Mior
mm...@apache.org

2017-06-21 2:41 GMT-04:00 web master <socketman2...@gmail.com>:

> Assume this schema
>
> CREATE TABLE t(
> a int,
> b int,
> c int,
> d int,
> e text,
> f date,
> g int,
> PRIMARY KEY (a,b)
> )
>
>
> I we create following mv
>
> CREATE MATERIALIZED VIEW t_mv as
> select a,b,c,d from t where c is not null and d is not null
>  PRIMARY KEY (c,d,a,b);
>
>
> What happens if we run this query
>
> UPDATE t SET g=1 WHERE a=10 AND b = 20
>
>
> As you can see "g" is excluded in "t_mv" , I want to know what cassandra
> doing internaly?
>
> Is there any overhead for t_mv , or cassandra smartly detect there is no
> changes for t_mv and no-operation
>
>
> for example If we have 10 materialized view like above-mentioned , is
> Update that excluded in mv impact performance? or the performance in equal
> to when there is no mv
>


Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread Michael Mior
For queries 1-5 this seems like a potentially good use case for
materialized views. Create one table with the videos stored by ID and the
materialized views for each of the queries.

--
Michael Mior
mm...@apache.org


2017-06-11 22:40 GMT-04:00 @Nandan@ <nandanpriyadarshi...@gmail.com>:

> Hi,
>
> Currently, I am working on data modeling for Video Company in which we
> have different types of users as well as different user functionality.
> But currently, my concern is about Search video module based on different
> fields.
>
> Query patterns are as below:-
> 1) Select video by actor.
> 2) select video by producer.
> 3) select video by music.
> 4) select video by actor and producer.
> 5) select video by actor and music.
>
> Note: - In short, We want to establish an advanced search module by which
> we can search by anyway and get the desired results.
>
> During a search , we need partial search also such that if any user can
> search "Harry" title, then we are able to give them result as all videos
> whose
>  title contains "Harry" at any location.
>
> As per my ideas, I have to create separate tables such as video_by_actor,
> video_by_producer etc.. and implement solr query on all tables. Otherwise,
> is there any others way by which we can implement this search module
> effectively.
>
> Please suggest.
>
> Best regards,
>


Re: NoSE: Automated schema design for Cassandra

2017-05-11 Thread Michael Mior
Thanks for the feedback! I did change column families to tables. I agree
the documentation could use some work. If you're interested in seeing what
the input and output look like, here's a sample:

https://michael.mior.ca/projects/nose/rubis

So far we haven't had any schemas used directly for production although it
has provided some advice on design alternatives. NoSE actually already
contains a mechanism to execute different schema alternatives which is what
we used during our evaluation. However, it does not currently directly
provide mechanism for synthetic data generation. It would definitely be
possible however to add automated generation of test data in the future.

Cheers,
--
Michael Mior
mm...@uwaterloo.ca

2017-05-10 3:55 GMT-04:00 Jacques-Henri Berthemet <
jacques-henri.berthe...@genesys.com>:

> Hi,
>
>
>
> This is interesting, I’d just advise to put full examples and more
> documentation on how to use it (the articles are a bit too detailed).
>
> Also, you should not mention “column families” but just tables.
>
>
>
> Was this used to generate a schema used for production?
>
> Do you think it’s possible to generate test code to validate the workload?
>
>
>
> *--*
>
> *Jacques-Henri Berthemet*
>
>
>
> *From:* michael.m...@gmail.com [mailto:michael.m...@gmail.com] *On Behalf
> Of *Michael Mior
> *Sent:* mardi 9 mai 2017 17:30
> *To:* user <user@cassandra.apache.org>
> *Subject:* NoSE: Automated schema design for Cassandra
>
>
>
> Hi all,
>
>
>
> I wanted to share a tool I've been working on that tries to help automate
> the schema design process for Cassandra. The short description is that you
> provide information on the kind of data you want to store and the queries
> and updates you want to issue, and NoSE will perform a cost-based analysis
> to suggest an optimal schema.
>
>
>
> There's lots of room for improvement and many Cassandra features which are
> not currently supported, but hopefully some in the community may still find
> it useful as a starting point.
>
>
>
> Link to more details and the source code below:
>
>
>
> https://michael.mior.ca/projects/nose/
>
>
>
> If you're interested in trying it out, don't hesitate to reach out and I'm
> happy to help!
>
>
>
> Cheers,
>
> --
>
> Michael Mior
>
> mm...@uwaterloo.ca
>


NoSE: Automated schema design for Cassandra

2017-05-09 Thread Michael Mior
Hi all,

I wanted to share a tool I've been working on that tries to help automate
the schema design process for Cassandra. The short description is that you
provide information on the kind of data you want to store and the queries
and updates you want to issue, and NoSE will perform a cost-based analysis
to suggest an optimal schema.

There's lots of room for improvement and many Cassandra features which are
not currently supported, but hopefully some in the community may still find
it useful as a starting point.

Link to more details and the source code below:

https://michael.mior.ca/projects/nose/

If you're interested in trying it out, don't hesitate to reach out and I'm
happy to help!

Cheers,
--
Michael Mior
mm...@uwaterloo.ca


Re: Doing an upsert into a collection?

2016-10-25 Thread Michael Mior
You could do this with a map instead of a list.

*CREATE TABLE movie (*
* id text,*
* name text,*
* ratings map<text, int>,*
* PRIMARY KEY ( id )*
*);*

*UPDATE movie SET ratings['bob'] = 5 WHERE id = 'terminator 3';*

--
Michael Mior
michael.m...@gmail.com

2016-10-24 18:16 GMT-04:00 Ali Akhtar <ali.rac...@gmail.com>:

> Say I have this UDT:
>
> *CREATE TYPE rating (*
> * user text,*
> * rating int*
> *);*
>
> And, I have this table:
>
> *CREATE TABLE movie (*
> * id text,*
> * name text,*
> * ratings list<FROZEN>,*
> * PRIMARY KEY ( id )*
> *);*
>
> Say a user 'bob' rated a movie as a 5. Is it possible to do something like
> this:
>
> *UPDATE movie set ratings.rating = 5 WHERE ratings.user = 'bob'*
>
> And have that query either update bob's previous rating if he had already
> rated, or have it insert a new Rating into the ratings w/ user = bob,
> rating = 5?
>
> If not, can this be achieved with a map instead of a list?
>
> Thanks.
>


Re: cassandra schema initialization in docker

2016-09-01 Thread Michael Mior
This is really more of a Docker question than a Cassandra question but if
you include the CQL file in your Docker image, you could just change the
CMD line in your Dockerfile to run the script after starting Cassandra. You
would probably need to add a delay and some retries to ensure the server
has finished starting.

--
Michael Mior
michael.m...@gmail.com

2016-09-01 14:45 GMT-04:00 Vova Shelgunov <vvs...@gmail.com>:

> I accept both cases. Second will work because I use create if not exists.
>
> 2016-09-01 21:02 GMT+03:00 Michael Mior <mm...@uwaterloo.ca>:
>
>> I'm not sure I understand what you're trying to do. Do you want this to
>> be executed once when the container is built or every time the container is
>> started?
>>
>> --
>> Michael Mior
>> michael.m...@gmail.com
>>
>> 2016-09-01 13:57 GMT-04:00 Vova Shelgunov <vvs...@gmail.com>:
>>
>>> Sorry, I did not specify, that I need to execute cql right
>>> after cassandra container start.
>>>
>>> 2016-09-01 20:52 GMT+03:00 Michael Mior <mm...@uwaterloo.ca>:
>>>
>>>> You should just be able to connect to the Cassandra instance and
>>>> execute CQL as you would against any other Cassandra installation. Any
>>>> applications wishing to use the Cassandra instance inside the container
>>>> will require the port to be exposed somehow anyway.
>>>>
>>>> --
>>>> Michael Mior
>>>> michael.m...@gmail.com
>>>>
>>>> 2016-09-01 13:47 GMT-04:00 Vova Shelgunov <vvs...@gmail.com>:
>>>>
>>>>> Hi,
>>>>>
>>>>> I wonder if anyone can suggest a way how to initialize application
>>>>> schema to cassandra inside docker container (e.g. by executing cql file).
>>>>> Is there a way?
>>>>>
>>>>> Thanks,
>>>>> Uladzimir
>>>>>
>>>>
>>>>
>>>
>>
>


Re: cassandra schema initialization in docker

2016-09-01 Thread Michael Mior
I'm not sure I understand what you're trying to do. Do you want this to be
executed once when the container is built or every time the container is
started?

--
Michael Mior
michael.m...@gmail.com

2016-09-01 13:57 GMT-04:00 Vova Shelgunov <vvs...@gmail.com>:

> Sorry, I did not specify, that I need to execute cql right
> after cassandra container start.
>
> 2016-09-01 20:52 GMT+03:00 Michael Mior <mm...@uwaterloo.ca>:
>
>> You should just be able to connect to the Cassandra instance and execute
>> CQL as you would against any other Cassandra installation. Any applications
>> wishing to use the Cassandra instance inside the container will require the
>> port to be exposed somehow anyway.
>>
>> --
>> Michael Mior
>> michael.m...@gmail.com
>>
>> 2016-09-01 13:47 GMT-04:00 Vova Shelgunov <vvs...@gmail.com>:
>>
>>> Hi,
>>>
>>> I wonder if anyone can suggest a way how to initialize application
>>> schema to cassandra inside docker container (e.g. by executing cql file).
>>> Is there a way?
>>>
>>> Thanks,
>>> Uladzimir
>>>
>>
>>
>


Re: cassandra schema initialization in docker

2016-09-01 Thread Michael Mior
You should just be able to connect to the Cassandra instance and execute
CQL as you would against any other Cassandra installation. Any applications
wishing to use the Cassandra instance inside the container will require the
port to be exposed somehow anyway.

--
Michael Mior
michael.m...@gmail.com

2016-09-01 13:47 GMT-04:00 Vova Shelgunov <vvs...@gmail.com>:

> Hi,
>
> I wonder if anyone can suggest a way how to initialize application schema
> to cassandra inside docker container (e.g. by executing cql file). Is there
> a way?
>
> Thanks,
> Uladzimir
>


Re: MATERIALIZED VIEW difference in 3.0.3 to 3.0.7/3.7

2016-06-21 Thread Michael Mior
It turns out this behaviour was not intended to be allowed and constructing
MVs like this can lead to issues. See
https://issues.apache.org/jira/browse/CASSANDRA-9928

--
Michael Mior
michael.m...@gmail.com

2016-06-21 7:32 GMT-04:00 Atul Saroha <atul.sar...@snapdeal.com>:

> There is  behavioural difference  between 3.0.3 and (3.0.7/3.7) for below
> schema in materialized view.
>
>
> CREATE TABLE ks.pa (
>> id bigint,
>> sub_id text,
>> name text,
>> class text,
>> r_id bigint,
>> k_id bigint,
>> created timestamp,
>> priority int,
>> updated timestamp,
>> value text,
>> PRIMARY KEY (id, sub_id, name)
>> );
>>
>> CREATE ks.mv_pa AS
>> SELECT k_id, name, value, sub_id, id, class, r_id
>> FROM ks.pa
>> WHERE k_id IS NOT NULL AND name IS NOT NULL AND value IS NOT NULL AND
>> sub_id IS NOT NULL AND id IS NOT NULL
>> PRIMARY KEY ((k_id, name), value, sub_id, id);
>>
>
> We were able to create below MV in 3.0.3 but it fails in 3.0.7/3.7 with
> following error
>
> InvalidRequest: code=2200 [Invalid query] message="Cannot include more
>> than one non-primary key column 'value' in materialized view partition key"
>>
>
> We are not able to upgrade it.  Also "value" is clustering key and "k_id"
> is in partition key. Thus, there is only one non-primary key column from
> main table  in partition key. Then why we are getting this error in
> 3.0.7/3.7 cassandra.
>
> Help will be appreciated.
>
>
>
> -
> Atul Saroha
> *Lead Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>


Re: MATERIALIZED VIEW difference in 3.0.3 to 3.0.7/3.7

2016-06-21 Thread Michael Mior
This appears to be a bug introduced in some refactoring of the materialized
view code. I logged CASSANDRA-12044 for this.

https://issues.apache.org/jira/browse/CASSANDRA-12044

--
Michael Mior
michael.m...@gmail.com

2016-06-21 7:32 GMT-04:00 Atul Saroha <atul.sar...@snapdeal.com>:

> There is  behavioural difference  between 3.0.3 and (3.0.7/3.7) for below
> schema in materialized view.
>
>
> CREATE TABLE ks.pa (
>> id bigint,
>> sub_id text,
>> name text,
>> class text,
>> r_id bigint,
>> k_id bigint,
>> created timestamp,
>> priority int,
>> updated timestamp,
>> value text,
>> PRIMARY KEY (id, sub_id, name)
>> );
>>
>> CREATE ks.mv_pa AS
>> SELECT k_id, name, value, sub_id, id, class, r_id
>> FROM ks.pa
>> WHERE k_id IS NOT NULL AND name IS NOT NULL AND value IS NOT NULL AND
>> sub_id IS NOT NULL AND id IS NOT NULL
>> PRIMARY KEY ((k_id, name), value, sub_id, id);
>>
>
> We were able to create below MV in 3.0.3 but it fails in 3.0.7/3.7 with
> following error
>
> InvalidRequest: code=2200 [Invalid query] message="Cannot include more
>> than one non-primary key column 'value' in materialized view partition key"
>>
>
> We are not able to upgrade it.  Also "value" is clustering key and "k_id"
> is in partition key. Thus, there is only one non-primary key column from
> main table  in partition key. Then why we are getting this error in
> 3.0.7/3.7 cassandra.
>
> Help will be appreciated.
>
>
>
> -
> Atul Saroha
> *Lead Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>


Re: Apache Cassandra's license terms

2016-03-20 Thread Michael Mior
Cassandra is under the Apache license (
https://www.apache.org/licenses/LICENSE-2.0). IANAL, but I don't believe
you are required contribute your changes. Of course, that's always
appreciated by the community :)

https://tldrlegal.com/license/apache-license-2.0-(apache-2.0)

--
Michael Mior
michael.m...@gmail.com

2016-03-18 9:07 GMT-04:00 Rakesh Kumar <rakeshkumar46...@gmail.com>:

> What type of Open source license does Cassandra follow?  If we use
> open source Cassandra for a revenue generating product, are we
> expected to contribute back our code to the open source.
>
> thanks
>


Re: Disable writing to debug.log

2016-03-01 Thread Michael Mior
There are instructions given /etc/cassandra/logback.xml



Looking later in the file, you'll see the following:

  

  
1024
0
true

  

Commenting out this section will disable writing to debug.log.

--
Michael Mior
mm...@uwaterloo.ca

2016-03-01 10:43 GMT-05:00 Rakesh Kumar <dcrunch...@aim.com>:

> Version: Cassandra 3.3
>
> Can anyone tell on how to disable writing to debug.log.
>
> thanks.
>


Cassandra Calcite integration

2016-02-22 Thread Michael Mior
Hi all,

For those not familiar, Apache Calcite is a data management framework that
enables storage-agnostic SQL query processing. The practical implications
are that by writing a relatively small amount of code, Calcite can execute
a large subset of SQL queries against different backend databases.

Over the past couple weeks I wrote a Cassandra adapter for Calcite. By just
pointing Calcite at a Cassandra installation, you can execute SQL queries
over the data stored in your Cassandra tables (including joins).

These queries will not necessarily be efficient as it entirely depends on
how your data is modelled in the underlying CQL tables. There's a lot of
work to be done, but I'm hoping this will be helpful to those who want to
do a bit of exploration of their data without writing any code.

I wrote a blog post here that provides more details:
http://michael.mior.ca/blog/calcite-cassandra-adapter/

Cheers,
--
Michael Mior
mm...@uwaterloo.ca


Re: Usage volume of older versions of Cassandra

2015-12-15 Thread Michael Mior
I assume you mean Cassandra 0.7, 0.8, and 1.0? I think most users are on
2.x now, but I don't have any stats.

--
Michael Mior
michael.m...@gmail.com

2015-12-15 9:28 GMT-05:00 Andy Kruth <krut...@gmail.com>:

> We are trying to decide how to proceed with development and support of
> YCSB bindings for older versions of Cassandra, namely Cassandra 7, 8, and
> 10.
>
> We would like to continue dev and support on these if the use of those
> versions of Cassandra is still prevalent. If not, then a deprecation cycle
> may be the best idea, in favor of CQL based bindings for modern versions of
> Cassandra.
>
> Any input from the Cassandra community on how common the usage of
> Cassandra 7, 8, and 10 are would be very helpful, thanks a lot.
>


CollationController not using collectTimeOrderedData

2015-02-24 Thread Michael Mior
Hi all,

I'd appreciate some help with a Cassandra 2.1.2 issue I'm experiencing. I'm
running a query which looks like this:

CREATE TABLE single_row_fetch (id uuid PRIMARY KEY, data text)
SELECT data FROM single_row_fetch WHERE id = ?

When writing test data into this table, I disabled compaction. I then wrote
data for performed a flush, and then overwrote the data, and so on. I
varied the number of times the data was overwritten and flushed. This has
the effect of controlling the number of SSTables. However, given that the
table only has one non-key row, only a single SSTable will ever have the
most recent data for this row. I confirmed that the expected number of
SSTables were generated and the timestamps of the  are as expected.

However, when I run the query with tracing, I see that Cassandra still
reads from ALL of the SSTables via collectAllData in CollationController.
Given that this query only fetches a single column, I would expect this
query to take the collectTimeOrderedData code path and then only examine
the first SSTable after seeing that it contains the relevant data.

Any insights on why this is the case and it what situations I would get the
expected behaviour would be incredibly helpful!

Cheers,
--
Michael Mior
michael.m...@gmail.com


Re: CollationController not using collectTimeOrderedData

2015-02-24 Thread Michael Mior
Thanks Robert!

https://issues.apache.org/jira/browse/CASSANDRA-8859

2015-02-24 14:06 GMT-05:00 Robert Coli rc...@eventbrite.com:

 On Tue, Feb 24, 2015 at 10:58 AM, Michael Mior michael.m...@gmail.com
 wrote:

 I'd appreciate some help with a Cassandra 2.1.2 issue I'm experiencing.
 I'm running a query which looks like this:
 ...



 However, when I run the query with tracing, I see that Cassandra still
 reads from ALL of the SSTables via collectAllData in CollationController.
 Given that this query only fetches a single column, I would expect this
 query to take the collectTimeOrderedData code path and then only examine
 the first SSTable after seeing that it contains the relevant data.


 I'd probably file this specific code issue as a JIRA ticket, or ask about
 it on the cassandra-dev mailing list. While devs do participate in this
 list, your question is a specific one about an internal implementation
 detail.

 If you file a JIRA, please let the list know its url! :D

 =Rob