DTCS Question

2016-03-19 Thread Anubhav Kale
I am using Cassandra 2.1.13 which has all the latest DTCS fixes (it does STCS 
within the DTCS windows). It also introduced a field called MAX_WINDOW_SIZE 
which defaults to one day.

So in my data folders, I may see SS Tables that span beyond a day (generated 
through old data through repairs or commit logs), but whenever I see a message 
in logs "Compacted Foo" (meaning the SS Table under question was definitely a 
result of compaction), the "Foo" SS Table should never have data beyond a day. 
Is this understanding accurate ?

If we have issues with repairs pulling in old data, should MAX_WINDOW_SIZE 
instead be set to a larger value so that we don't run the risk of too many SS 
Tables lying around and never getting compacted ?


Apache Cassandra's license terms

2016-03-19 Thread Rakesh Kumar
What type of Open source license does Cassandra follow?  If we use
open source Cassandra for a revenue generating product, are we
expected to contribute back our code to the open source.

thanks


Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Tom van den Berge
>
> Is text the most appropriate data type to store JSON that contain couple
> of dozen lines ?
>

It sure is the simplest way to store JSON.

The query requirement  is  "where executedby = ?”.
>

Since executedby is a timeuuid, I guess you don't want to query a single
record, since that would require you to know the exact timeuuid. Do you
mean that you would like to query all changes in a certain time frame, e.g.
today? In that case, you would have to group your rows in time buckets,
e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any
other period that suits your situation. Retrieving all changes in a
specific time frame is done by retrieving all relevant periods.

Tom


Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Jack Krupansky
executedby is the ID assigned to an employee.

I'm presuming that JSON is to be used for objectbefore/after. This suggests
no ability to query by individual object fields. I didn't sense any other
columns that would be JSON.



-- Jack Krupansky

On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge 
wrote:

> Is text the most appropriate data type to store JSON that contain couple
>> of dozen lines ?
>>
>
> It sure is the simplest way to store JSON.
>
> The query requirement  is  "where executedby = ?”.
>>
>
> Since executedby is a timeuuid, I guess you don't want to query a single
> record, since that would require you to know the exact timeuuid. Do you
> mean that you would like to query all changes in a certain time frame, e.g.
> today? In that case, you would have to group your rows in time buckets,
> e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any
> other period that suits your situation. Retrieving all changes in a
> specific time frame is done by retrieving all relevant periods.
>
> Tom
>


cqlsh problem

2016-03-19 Thread joseph gao
hi, all
cassandra version 2.1.7
When I use cqlsh to connect cassandra, something is wrong

Connection error: ( Unable to connect to any servers', {'127.0.0.1':
OperationTimedOut('errors=None, last_host=None,)})

This happens lots of times, but sometime it works just fine. Anybody knows
why?

-- 
--
Joseph Gao
PhoneNum:15210513582
QQ: 409343351


Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Clint Martin
I would arrange your primary key by how You intend to query.

Primary key ((executedby), auditid)

This allows you to query for who did it, and optionally on a time range for
when it occurred.  Retrieving in chronological order.

You could do it with your proposed schema and Lucene but for what you have
stated your requirements are lucene is not necessary.

This proposed structure could result in wide partitions, depending on how
busy individuals are, and if so introducing a granular bucket to the
primary key can combat this. The overhead of doing so is relatively minor.

You can of course use Lucene with this model as well for filtering on other
fields. (Or add more fields to your index as appropriate)

Clint
On Mar 16, 2016 4:52 PM, "Jack Krupansky"  wrote:

> Stratio (or DSE Search) should be good for ad hoc or complex queries, but
> if there are some fixed/common query patterns you might be better off
> implementing query tables or using materialized views. The latter allows
> you to include a non-PK data column in the PK of the MV so that you can
> directly access the indexed row without the complexity of Lucene/DSE. This
> also lets you effectively cluster data that will be commonly accessed
> together on a single node/partition, and to do it automatically without any
> application logic to manually duplicate/update data.
>
> (3.x still has the restriction that an MV PK can only include one non-PK
> data column - CASSANDRA-9928
> .)
>
> -- Jack Krupansky
>
> On Wed, Mar 16, 2016 at 4:40 PM, I PVP  wrote:
>
>> Jack/Tom
>> Thanks for answering.
>>
>> Here is the table definition so far:
>>
>> CREATE TABLE audit_trail (
>> auditid timeuuid,
>> actiontype text,
>> objecttype text,
>> executedby uuid ( or timeuuid?),
>> executedat timestamp,
>> objectbefore text,
>> objectafter text,
>> clientipaddr text,
>> serveripaddr text,
>> servername text,
>> channel text,
>> PRIMARY KEY (auditid)
>> );
>>
>> objectbefore/after are the only ones that will have JSON content. quering
>> based on the contents of these two  columns are not a requirement.
>>
>> At this moment the queries are going to be mainly on executedby ( the
>> employee id).
>> Stratio’s Cassandra Lucene Index will be used to allow querying/filtering
>> on executedat (timestamp) ,objecttype(order, customer, ticket,
>> message,account, paymenttransaction,refund etc.)  and actiontype(create,
>> retrieve, update, delete, approve, activate, unlock, lock etc.) .
>>
>> I am considering to count exclusively on Stratio’s Cassandra Lucene
>>  filtering and avoid to add  “period” columns like month(int), year(int),
>> day (int).
>>
>> Thanks
>>
>> --
>> IPVP
>>
>>
>> From: Jack Krupansky 
>> 
>> Reply: user@cassandra.apache.org >
>> 
>> Date: March 16, 2016 at 5:22:36 PM
>> To: user@cassandra.apache.org >
>> 
>> Subject:  Re: Modeling Audit Trail on Cassandra
>>
>> executedby is the ID assigned to an employee.
>>
>> I'm presuming that JSON is to be used for objectbefore/after. This
>> suggests no ability to query by individual object fields. I didn't sense
>> any other columns that would be JSON.
>>
>>
>>
>> -- Jack Krupansky
>>
>> On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge 
>> wrote:
>>
>>> Is text the most appropriate data type to store JSON that contain couple
 of dozen lines ?

>>>
>>> It sure is the simplest way to store JSON.
>>>
>>> The query requirement  is  "where executedby = ?”.

>>>
>>> Since executedby is a timeuuid, I guess you don't want to query a single
>>> record, since that would require you to know the exact timeuuid. Do you
>>> mean that you would like to query all changes in a certain time frame, e.g.
>>> today? In that case, you would have to group your rows in time buckets,
>>> e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any
>>> other period that suits your situation. Retrieving all changes in a
>>> specific time frame is done by retrieving all relevant periods.
>>>
>>> Tom
>>>
>>
>>
>


Re: Single node Solr FTs not working

2016-03-19 Thread Jack Krupansky
Have you verified that the documented reference example functions as
expected on your system? If so, then incrementally morph it towards your
own code to discover exactly at which stage the problem occurs. Or just
having the reference example side by side with your own code/schema/table
will help highlight what the difference is that causes the problem.

Doc:
http://docs.datastax.com/en/latest-dse/datastax_enterprise/srch/srchTrnsFrm.html

-- Jack Krupansky

On Fri, Mar 18, 2016 at 4:30 AM, Joseph Tech  wrote:

> Hi,
>
> I had setup a single-node DSE 4.8.x to start in Search mode to explore
> some aspects of Solr search with field transformers (FT). Even though the
> configuration seems fine and Solr admin shows the indexed data, and
> searches on the actual fields (stored=true) work fine, but the FTs are not
> being invoked during the indexing and the search using fields managed by
> the FT don't work , i.e the evaluate(), addFieldToDocument() etc are not
> invoked. There are no ERRORs or similar indications in system.log, and
> solrvalidation.log is not having any entries too.
>
> The only warnings are during node startup for the non-stored fields like
> xyz
>
> WARN  [SolrSecondaryIndex checkout.cart index initializer.] 2016-03-16
> 17:24:57,956  CassandraIndexSchema.java:537 - No Cassandra column found for
> field: xyz
>
> The FT configuration was verified by changing the FT's class name in
> solrconfig.xml and it threw a ClassNotFoundException, which didnt appear
> with the right classname was given.
>
> The data is being inserted and retrieved from the same node. Please
> suggest any pointers to debug this.
>
> Thanks,
> Joseph
>


RE: DTCS bucketing Question

2016-03-19 Thread Anubhav Kale
CIL

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Thursday, March 17, 2016 11:01 AM
To: user@cassandra.apache.org
Subject: Re: DTCS bucketing Question

>  am trying to concretely understand how DTCS makes buckets and I am looking 
> at the DateTieredCompactionStrategyTest.testGetBuckets method and played with 
> some of the parameters to GetBuckets method call (Cassandra 2.1.12). I don’t 
> think I fully understand something there.

Don’t feel bad, you’re not alone.

> In this case, the buckets should look like [0-4000] [4000-]. Is this correct 
> ? The buckets that I get back are different (“a” lives in its bucket and 
> everyone else in another). What I am missing here ?

The latest/newest window never gets combined, it’s ALWAYS the base size. Only 
subsequent windows get merged. First window will always be 0-1000. 
https://spotifylabscom.files.wordpress.com/2014/12/dtcs3.png
[Anubhav Kale] This doesn’t seem correct. In the original test (look at 
comments), the first window is pretty big and in many cases, the first window 
is big.

> Note, that if I keep the base to original (100L) or increase it and play with 
> min_threshold the results are exactly what I would expect.

Because the original base is lower than the lowest timestamp, which means 
you’re never looking in the first window (0-base).

> I am afraid that the math in Target class is somewhat hard to follow so I am 
> thinking about it this way.

The Target class is too clever for its own good. I couldn’t follow it. You’re 
having trouble following it.  Other smart people I’ve talked to couldn’t follow 
it. Last June I proposed an alternative (CASSANDRA-9666 / 
https://github.com/jeffjirsa/twcs ). It was never taken upstream, but it does 
get a fair bit of use by people with large time series clusters (we use it on 
one of our petabyte-scale clusters here). Significantly easier to reason about.

  *   Jeff


From: Anubhav Kale
Reply-To: "user@cassandra.apache.org"
Date: Thursday, March 17, 2016 at 10:24 AM
To: "user@cassandra.apache.org"
Subject: DTCS bucketing Question



Hello,

I am trying to concretely understand how DTCS makes buckets and I am looking at 
the DateTieredCompactionStrategyTest.testGetBuckets method and played with some 
of the parameters to GetBuckets method call (Cassandra 2.1.12).

I don’t think I fully understand something there. Let me try to explain.

Consider the second test there. I changed the pairs a bit for easier 
explanation and changed base (initial window size)=1000L and Min_Threshold=2

pairs = Lists.newArrayList(
Pair.create("a", 200L),
Pair.create("b", 2000L),
Pair.create("c", 3600L),
Pair.create("d", 3899L),
Pair.create("e", 3900L),
Pair.create("f", 3950L),
Pair.create("too new", 4125L)
);
buckets = getBuckets(pairs, 1000L, 2, 4050L, Long.MAX_VALUE);

In this case, the buckets should look like [0-4000] [4000-]. Is this correct ? 
The buckets that I get back are different (“a” lives in its bucket and everyone 
else in another). What I am missing here ?

Another case,

pairs = Lists.newArrayList(
Pair.create("a", 200L),
Pair.create("b", 2000L),
Pair.create("c", 3600L),
Pair.create("d", 3899L),
Pair.create("e", 3900L),
Pair.create("f", 3950L),
Pair.create("too new", 4125L)
);
buckets = getBuckets(pairs, 50L, 4, 4050L, Long.MAX_VALUE);

Here, the buckets should be [0-3200] [3200-4000] [4000-4050] [4050-]. Is this 
correct ? Again, the buckets that come back are quite different.

Note, that if I keep the base to original (100L) or increase it and play with 
min_threshold the results are exactly what I would expect.

The way I think about DTCS is, try to make buckets of maximum possible sizes 
from 0, and once you can’t make do that , make smaller buckets (similar to what 
the comment suggests). Is this mental model wrong ? I am afraid that the math 
in Target class is somewhat hard to follow so I am thinking about it this way.

Thanks a lot in advance.

-Anubhav


Re: DTCS Question

2016-03-19 Thread Marcus Eriksson
On Wed, Mar 16, 2016 at 6:49 PM, Anubhav Kale 
wrote:

> I am using Cassandra 2.1.13 which has all the latest DTCS fixes (it does
> STCS within the DTCS windows). It also introduced a field called
> MAX_WINDOW_SIZE which defaults to one day.
>
>
>
> So in my data folders, I may see SS Tables that span beyond a day
> (generated through old data through repairs or commit logs), but whenever I
> see a message in logs “Compacted Foo” (meaning the SS Table under question
> was definitely a result of compaction), the “Foo” SS Table should never
> have data beyond a day. Is this understanding accurate ?
>
No - not until https://issues.apache.org/jira/browse/CASSANDRA-10496 (read
for explanation)


>
>
> If we have issues with repairs pulling in old data, should MAX_WINDOW_SIZE
> instead be set to a larger value so that we don’t run the risk of too many
> SS Tables lying around and never getting compacted ?
>
No, with CASSANDRA-10280 that old data will get compacted if needed
(assuming you have default settings). If the remote node is correctly date
tiered, the streamed sstable will also be correctly date tiered. Then that
streamed sstable will be put in a time window and if there are enough
sstables in that old window, we do a compaction.

/Marcus


Re: Experiencing strange disconnect issue

2016-03-19 Thread Steve Robenalt
Hi Bo,

I would suggest adding:

.withReconnectionPolicy(new ExponentialReconnectionPolicy(1000,3))

or something similar to your cluster builder.

Steve


On Wed, Mar 16, 2016 at 11:18 AM, Bo Finnerup Madsen  wrote:

> Hi Sean,
>
> Thank you for taking the time to answer :)
> We are using a very vanilla connection, without any sorts of tuning
> policies. The cluster/session is constructed as follows:
> final Cluster cluster = Cluster.builder()
>
> .addContactPoints(key.getContactPoints())
> .build();
> final Session session =
> cluster.connect(key.getKeyspace());
> Perhaps it is too vanilla and we are missing something?
>
> Since I posted the question, I have tried downgrading to cassandra v2.1.13
> and java driver 2.1.9. But got the same error. So I suspect it is something
> we are doing wrong.
>
>
>
> ons. 16. mar. 2016 kl. 18.59 skrev :
>
>> Are you using any of the Tuning Policies (
>> https://docs.datastax.com/en/developer/java-driver/2.0/common/drivers/reference/tuningPolicies_c.html)?
>> It could be that you are hitting some peak load and the driver is not
>> retrying hosts once they are marked “down.”
>>
>>
>>
>>
>>
>> Sean Durity – Lead Cassandra Admin
>>
>> Big DATA Team
>>
>> For support, create a JIRA
>> 
>>
>>
>>
>> *From:* Bo Finnerup Madsen [mailto:bo.gunder...@gmail.com]
>> *Sent:* Tuesday, March 15, 2016 5:24 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Experiencing strange disconnect issue
>>
>>
>>
>> Hi,
>>
>>
>>
>> We are currently trying to convert an existing java web application to
>> use cassandra, and while most of it works great :) we have a "small" issue.
>>
>>
>>
>> After some time, we all connectivity seems to be lost and we get the
>> following errors:
>>
>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>> tried for query failed (tried: /10.61.70.107:9042
>> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.107]
>> Connection has been closed), /10.61.70.108:9042
>> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.108]
>> Connection has been closed))
>>
>>
>>
>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>> tried for query failed (tried: /10.61.70.107:9042
>> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
>> to acquire available connection (you may want to increase the driver number
>> of per-host connections)), /10.61.70.108:9042
>> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.108]
>> Connection has been closed), /10.61.70.110:9042
>> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.110]
>> Connection has been closed))
>>
>>
>>
>> The errors persists, and the application needs to be restarted to recover.
>>
>>
>>
>> At application startup we create a cluster and a session which we reuse
>> through out the application as pr. the documentation. We don't specify any
>> other options when connecting than the IP's of the three servers. We are
>> running cassandra 3.0.3 tar ball in EC2 in a cluster of three machines. The
>> connections are made using v3.0.0 java driver.
>>
>>
>>
>> I have uploaded the configuration and logs from our cassandra cluster
>> here: https://gist.github.com/anonymous/452e736b401317b5b38d
>>
>> The issue happend at 00:44:46.
>>
>>
>>
>> I would greatly appreciate any ideas as to what we are doing wrong to
>> experience this? :)
>>
>>
>>
>> Thank you in advance!
>>
>>
>>
>> Yours sincerely,
>>
>>   Bo Madsen
>>
>> --
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>


-- 
Steve Robenalt
Software Architect
sroben...@highwire.org 
(office/cell): 916-505-1785

HighWire Press, Inc.
425 Broadway St, Redwood City, 

Re: Deploy latest cassandra on top of datastax-ddc ?

2016-03-19 Thread Mohamed Lrhazi
because I have no clue... :)

So, after doing an ant build from the latest source... how would one
"install" or deploy cassandra?  Could not find a document on the install
from source part... any pointers?  All I find makes use of yum or apt
repo's, or deploy from binary tarball...

Thanks a lot,
Mohamed.


On Fri, Mar 18, 2016 at 4:50 PM, Robert Coli  wrote:

> On Thu, Mar 17, 2016 at 10:38 PM, Mohamed Lrhazi <
> mohamed.lrh...@georgetown.edu> wrote:
>
>> Would simply overriding this one jar file do it? else could you please
>> share a procedure?
>>
>
> This seems like an odd thing to want to do. Why do you believe it is
> likely to work?
>
> =Rob
>
>


Re: Strategies for avoiding corrupted duplicate data?

2016-03-19 Thread Clint Martin
Light weight transactions are going to be somewhat key to this. As are
batches.

The interesting thing about these views is that changing an email address
is not the same operation on all of them.

For The users by email view you have to delete a given existing row and
insert a new one.

For the others an update using a lwt to ensure the existing email is what
you think it is, will be sufficient.

The batch is necessary because you inherently have a race condition when
updating all of these tables. Two really. One as you update each table. And
one where you are concerned about where two updates for different values
occur at the same time. These two cases are related and interact poorly

If you are using Cassandra 3.x materialized views would be a good solution
for this.

Clint





On Mar 17, 2016 2:22 AM, "Max C"  wrote:

> Hello,
>
> What are your best practices for avoiding collisions when updating
> duplicate (derived) data?  For example if I have tables like this:
>
> users — (username, email, phone) — PK (username)
> users_by_email — (username, email, phone) — PK (email)
> users_by_phone — (username, email, phone) — PK ((phone), username)
>
> … and I want to change the user’s email address ...
>
> What is your strategy for ensuring that two writers don’t try to update
> the same records at the same time?
>
> Some sort of lock?
>
> 1) Do you have a global “locks” table in Cassandra, and use a LWT to lock
> it?
>
> insert into locks (what, locked_by) values ('users:maxc',
> ’server1_pid1234') if not exists;
> # Verify insert succeeded
> # Now that I have the lock, grab the latest
> select * from users where username=‘maxc’;
> begin batch
> # … apply changes ...
> apply batch;
> # release lock
> delete from locks where what=‘users:maxc’;
>
> 2) Do you add a “locked_by” column to the master (“users”) table, and then
> use a LWT to lock it?
>
> update users set locked_by=‘server1_pid1234’’ where username=“maxc” if
> locked_by = null;
> # Verify update succeeded
> # Now that I have the lock, grab the latest
> select * from users where username=‘maxc’;
> begin batch
> # … apply changes ...
> apply batch;
> # release lock
> update users set locked_by=null where username=‘maxc’;
>
> 3) Do you use something outside of Cassandra to manage the locks?  Zoo
> keeper?
>
> ## Acquire external lock ##
> # Now that I have the lock, grab the latest
> select * from users where username=‘maxc’;
> begin batch
> # … apply changes ...
> apply batch;
> ## release external lock ##
>
> Or is there some other way to do this that I’m totally missing??
> Materialized views in 3.x, I suppose.  Other ideas?
>
> Thanks!
>
> - Max
>


Re: Questions about Datastax support

2016-03-19 Thread Jack Krupansky
Maybe the question is what the fourth component of the release number
actually means. The key point is simply that they have included additional
fixes beyond the base Apache version - fixes that show up in future Apache
releases that hadn't been released as of when they tested their DSE release.

-- Jack Krupansky

On Thu, Mar 17, 2016 at 10:39 AM, Rakesh Kumar 
wrote:

> > 1. They have a published support policy:
> > http://www.datastax.com/support-policy/supported-software
>
> Why is the version number so different from the cassandra community
> edition.
>
> Take a look at this:
> 4.8.2Release NotesNov 11, 2015Mar 23, 2016Sep 23, 2017
>
> What is version 4.8.2
>


Python to type field

2016-03-19 Thread Rakesh Kumar
Hi

I have a type defined as follows

CREATE TYPE etag (
ttype int,
tvalue text
);

And this is used in a col of a table as follows

 evetag list  >

I have the following value in a file
[{ttype: 3 , tvalue: '90A1'}]

This gets inserted via COPY command with no issues.

However when I try to insert the same via a python program which I am
writing. where I prepare and then bind, I get this error while executing

TypeError: Received an argument of invalid type for column "evetag".
Expected: , Got: ; (Received a string for a type that
expects a sequence)

I tried casting the variable in python to list, tuple, but same error.


Deploy latest cassandra on top of datastax-ddc ?

2016-03-19 Thread Mohamed Lrhazi
Would simply overriding this one jar file do it? else could you please
share a procedure?

[root@avesterra-prod-1 ~]# rpm -qa| grep stax

datastax-ddc-tools-3.2.1-1.noarch
datastax-ddc-3.2.1-1.noarch

[root@avesterra-prod-1 ~]# cp /tmp/apache-cassandra-3.6-SNAPSHOT.jar
/usr/share/cassandra/apache-cassandra-3.2.1.jar

[root@avesterra-prod-1 ~]# systemctl restart cassandra

[root@avesterra-prod-1 ~]# cassandra -v
3.6-SNAPSHOT
[root@avesterra-prod-1 ~]#



Thanks a lot,
Mohamed.


Re: Read consistency

2016-03-19 Thread Alain RODRIGUEZ
Hi Arko,

Never used that consistency level so far, but here is some information:
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_tunable_consistency_c.html

Cassandra 2.0 uses the Paxos consensus protocol, which resembles 2-phase
> commit, to support linearizable consistency. All operations are
> *quorum-based* and updates will incur a performance hit, effectively a
> degradation to one-third of normal. For in-depth information about this new
> consistency level, see the article,*Lightweight transactions in Cassandra*
> 
> .


 So it appears that a read using quorum should be fine (remember to use
local consistency level on a multi-DC environment if this is what you want
to do).

Never checked that on my own, just read that, fwiw.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-16 2:43 GMT+01:00 Arko Provo Mukherjee :

> Hello,
>
> I am designing a system where for a situation, I need to have SERIAL
> consistency during writes.
>
> I understand that if the write was with QUORUM, a Read with QUORUM would
> have fetched me the correct (most recent) data.
>
> My question is what is the minimum consistency level required for read,
> when my write consistency is SERIAL.
>
> Any pointers would be much appreciated.
>
> Thanks & regards
> Arko
>
>


Re: Multi DC setup for analytics

2016-03-19 Thread Reddy Raja
Yes. Here are the steps.
You will have to change the DC Names first.
DC1 and DC2 would be independent clusters.

Create a new DC, DC3 and include these two DC's on DC3.

This should work well.


On Thu, Mar 17, 2016 at 11:03 PM, Clint Martin <
clintlmar...@coolfiretechnologies.com> wrote:

> When you say you have two logical DC both with the same name are you
> saying that you have two clusters of servers both with the same DC name,
> nether of which currently talk to each other? IE they are two separate
> rings?
>
> Or do you mean that you have two keyspaces in one cluster?
>
> Or?
>
> Clint
> On Mar 14, 2016 2:11 AM, "Anishek Agarwal"  wrote:
>
>> Hello,
>>
>> We are using cassandra 2.0.17 and have two logical DC having different
>> Keyspaces but both having same logical name DC1.
>>
>> we want to setup another cassandra cluster for analytics which should get
>> data from both the above DC.
>>
>> if we setup the new DC with name DC2 and follow the steps
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>> will it work ?
>>
>> I would think we would have to first change the names of existing
>> clusters to have to different names and then go with adding another dc
>> getting data from these?
>>
>> Also as soon as we add the node the data starts moving... this will all
>> be only real time changes done to the cluster right ? we still have to do
>> the rebuild to get the data for tokens for node in new cluster ?
>>
>> Thanks
>> Anishek
>>
>


-- 
"In this world, you either have an excuse or a story. I preferred to have a
story"


discrepancy in up nodes from different nodes

2016-03-19 Thread Surbhi Gupta
Hi,

I have changed endpoint_snitch from Simple to GossipingPropertyFileSnitch.
And changed the cassandra-rackdc.properties file to reflect the correct DC
and RACK.

However when i did rolling restart then one node is showing 15 nodes up,
otehr node is showing 10 nodes up etc.

I have done rolling restart multiple time but number are changing but it is
not consistent.

Anything i missed it.

Thanks
Surbhi


Understanding SELECT * paging/ordering

2016-03-19 Thread Dan Checkoway
Say I have a table with 50M rows in a keyspace with RF=3 in a cluster of 15
nodes (single local data center).  When I do "SELECT * FROM table" and page
through those results (with a fetch size of say 1000), I'd like to
understand better how that paging works.

Specifically, what determines the order in which which rows are returned?
And what's happening under the hood...i.e. is the coordinator fetching
pages of 1000 from each node, passing some sort of paging state to each
node, and the coordinator merges the per-node sorted result sets?

I thought maybe the results would be sorted by partition key, but that
doesn't seem to be the case (I'm not 100% sure about this).

I'm also curious how consistency level comes into play.  i.e. if I use ONE
vs. QUORUM vs. ALL, how that impacts where the results come from and how
they're ordered, merged, and who knows what else I don't know...  :-)

Very curious how this works.  Thanks in advance!


Re: What does FileCacheService's log message (invalidating cache) mean?

2016-03-19 Thread Satoshi Hikida
Sorry there is a mistake in my previous post. I would correct it.

In Q3, I mentioned there are a lot of invalidating messages in the
debug.log. It is true but cassandra configurations were wrong. In that
case, the cassandra.yaml configurations are as follows:

- cassandra.yaml
- compaction_throughput_mb_per_sec: 0 (not 8 or default)
- concurrent_compactors: 1
- sstable_preemptive_open_interval_in_mb: 0  (not 8 or default)
- memtable_flush_writers: 1

And More precisely, in that case, Cassandra keep on outputting invalidating
messages for a while(a few hours). However CPU usage is almost 0.0% in top
command like below.

$ top -bu cassandra -n 1
...
PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
2631 cassand+  20   0  0.250t 1.969g 703916 S   0.0 57.0   8459:35 java

I want to know what was actually happening at that time.


Regards,
Satoshi


On Thu, Mar 17, 2016 at 3:56 PM, Satoshi Hikida  wrote:

> Thank you for your very useful advice!
>
>
> Definitely, I'm using Cassandra V2.2.5 not 3.x. And basically I've
> understood what does these logs mean. But I have more a few questions. So I
> would very much appreciate If I get some explanations about these questions.
>
> * Q1.
> In my understand, when open a SSTable, a lot of RandomAccessReaders(RARs)
> are created. A number of RARs is equal to a number of segments of SSTable.
> Is a number of segments(=RARs) equal to follows?
>
> a number of segments = size of SSTable / size of segments
>
> * Q2.
> What is happen if the Cassandra open a SSTable file which bigger than JVM
> heap (or memory)?
>
> * Q3.
> In my case, there are a lot of invalidating messages for the same SSTable
> file (e.g. at least 11 records for tmplink-la-8348-big-Data.db in my
> previous post). In some cases, there are more than 600 invalidating
> messages for the same file and these messages logged for a few hours. Would
> that closing a big SSTable is the cause?
>
> * Q4.
> I saw "tmplink-xxx" or "tmp-xxx" files in the logs and also data
> directories. Are these files temporary in compaction process?
>
>
> Here is my experimental configurations.
>
> - Cassandra node: An aws EC2 instance(t2.medium. 4GBRAM, 2vCPU)
> - Cassandra version: 2.2.5
> - inserted data size: about 100GB
> - cassandra-env.sh: default
> - cassandra.yaml
> - compaction_throughput_mb_per_sec: 8 (or default)
> - concurrent_compactors: 1
> - sstable_preemptive_open_interval_in_mb: 25 (or default)
> - memtable_flush_writers: 1
>
>
> Regards,
> Satoshi
>
>
> On Wed, Mar 16, 2016 at 5:47 PM, Stefania Alborghetti <
> stefania.alborghe...@datastax.com> wrote:
>
>> Each sstable has one or more random access readers (one per segment for
>> example) and FileCacheService is a cache for such readers. When an sstable
>> is closed, the cache is invalidated. If no single reader of an sstable is
>> used for at least 512 milliseconds, all readers are evicted. If the sstable
>> is opened again, new reader(s) will be created and added to the cache again.
>>
>> FileCacheService was removed in cassandra 3.0 in favour of a pool of
>> page-aligned buffers, and sharing the NIO file channels amongst the readers
>> of an sstable, refer to CASSANDRA-8897
>>  and CASSANDRA-8893
>>  for more details.
>>
>> On Wed, Mar 16, 2016 at 3:30 PM, satoshi hikida 
>> wrote:
>>
>>> Hi,
>>>
>>> I have been working on some experiments for Cassandra and found some log
>>> messages as follows in debug.log.
>>> I am not sure what it exactly is, so I would appreciate if someone gives
>>> me some explanations about it.
>>>
>>> In my verification, a Cassandra node runs as a stand-alone server on
>>> Amazon EC2 instance(t2.medium). And I insert 1 Billion records (about 100GB
>>> data size) to a table from a client application (which runs on another
>>> instance separated from Cassandra node). After insertion, Cassandra
>>> continues it's I/O activities for (probably) compaction and keep logging
>>> the messages as follows:
>>>
>>> ---
>>> ...
>>> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:25,170
>>> FileCacheService.java:102 - Evicting cold readers for
>>> /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/la-6-big-Data.db
>>> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:31,780
>>> FileCacheService.java:177 - Invalidating cache for
>>> /var/lib/cassandra/data/test/user-3d988520e9e011e59d830f00df8833fa/tmplink-la-8348-big-Data.db
>>> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:36,899
>>> FileCacheService.java:177 - Invalidating cache for
>>> /var/lib/cassandra/data/test/user-3d988520e9e011e59d830f00df8833fa/tmplink-la-8348-big-Data.db
>>> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:42,187
>>> FileCacheService.java:177 - Invalidating cache for
>>> /var/lib/cassandra/data/test/user-3d988520e9e011e59d830f00df8833fa/tmplink-la-8348-big-Data.db
>>> DEBUG 

Re: DTCS bucketing Question

2016-03-19 Thread Jeff Jirsa
>  am trying to concretely understand how DTCS makes buckets and I am looking 
> at the DateTieredCompactionStrategyTest.testGetBuckets method and played with 
> some of the parameters to GetBuckets method call (Cassandra 2.1.12). I don’t 
> think I fully understand something there.



Don’t feel bad, you’re not alone. 



> In this case, the buckets should look like [0-4000] [4000-]. Is this correct 
> ? The buckets that I get back are different (“a” lives in its bucket and 
> everyone else in another). What I am missing here ?


The latest/newest window never gets combined, it’s ALWAYS the base size. Only 
subsequent windows get merged. First window will always be 0-1000. 
https://spotifylabscom.files.wordpress.com/2014/12/dtcs3.png

> Note, that if I keep the base to original (100L) or increase it and play with 
> min_threshold the results are exactly what I would expect.

Because the original base is lower than the lowest timestamp, which means 
you’re never looking in the first window (0-base).

> I am afraid that the math in Target class is somewhat hard to follow so I am 
> thinking about it this way.

The Target class is too clever for its own good. I couldn’t follow it. You’re 
having trouble following it.  Other smart people I’ve talked to couldn’t follow 
it. Last June I proposed an alternative (CASSANDRA-9666 / 
https://github.com/jeffjirsa/twcs ). It was never taken upstream, but it does 
get a fair bit of use by people with large time series clusters (we use it on 
one of our petabyte-scale clusters here). Significantly easier to reason about. 
Jeff
From:  Anubhav Kale
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, March 17, 2016 at 10:24 AM
To:  "user@cassandra.apache.org"
Subject:  DTCS bucketing Question



 

Hello,

 

I am trying to concretely understand how DTCS makes buckets and I am looking at 
the DateTieredCompactionStrategyTest.testGetBuckets method and played with some 
of the parameters to GetBuckets method call (Cassandra 2.1.12). 

 

I don’t think I fully understand something there. Let me try to explain.

 

Consider the second test there. I changed the pairs a bit for easier 
explanation and changed base (initial window size)=1000L and Min_Threshold=2

 

pairs = Lists.newArrayList(

Pair.create("a", 200L),

Pair.create("b", 2000L),

Pair.create("c", 3600L),

Pair.create("d", 3899L),

Pair.create("e", 3900L),

Pair.create("f", 3950L),

Pair.create("too new", 4125L)

);

buckets = getBuckets(pairs, 1000L, 2, 4050L, Long.MAX_VALUE);

 

In this case, the buckets should look like [0-4000] [4000-]. Is this correct ? 
The buckets that I get back are different (“a” lives in its bucket and everyone 
else in another). What I am missing here ?

 

Another case, 

 

pairs = Lists.newArrayList(

Pair.create("a", 200L),

Pair.create("b", 2000L),

Pair.create("c", 3600L),

Pair.create("d", 3899L),

Pair.create("e", 3900L),

Pair.create("f", 3950L),

Pair.create("too new", 4125L)

);

buckets = getBuckets(pairs, 50L, 4, 4050L, Long.MAX_VALUE);

 

Here, the buckets should be [0-3200] [3200-4000] [4000-4050] [4050-]. Is this 
correct ? Again, the buckets that come back are quite different. 

 

Note, that if I keep the base to original (100L) or increase it and play with 
min_threshold the results are exactly what I would expect. 

 

The way I think about DTCS is, try to make buckets of maximum possible sizes 
from 0, and once you can’t make do that , make smaller buckets (similar to what 
the comment suggests). Is this mental model wrong ? I am afraid that the math 
in Target class is somewhat hard to follow so I am thinking about it this way.

 

Thanks a lot in advance.

 

-Anubhav



smime.p7s
Description: S/MIME cryptographic signature


Re: cqlsh problem

2016-03-19 Thread Alain RODRIGUEZ
Hi, did you try with the address of the node rather than 127.0.0.1

Is the transport protocol used by cqlsh (not sure if it is thrift or binary
- native in 2.1)  active ? What is the "nodetool info" output ?

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-17 14:26 GMT+01:00 joseph gao :

> hi, all
> cassandra version 2.1.7
> When I use cqlsh to connect cassandra, something is wrong
>
> Connection error: ( Unable to connect to any servers', {'127.0.0.1':
> OperationTimedOut('errors=None, last_host=None,)})
>
> This happens lots of times, but sometime it works just fine. Anybody knows
> why?
>
> --
> --
> Joseph Gao
> PhoneNum:15210513582
> QQ: 409343351
>


Re: cqlsh problem

2016-03-19 Thread Alain RODRIGUEZ
Is the node fully healthy or rejecting some requests ?

What are the outputs for "grep -i "ERROR" /var/log/cassandra/system.log"
and "nodetool tpstats"?

Any error? Any pending / blocked or dropped messages?

Also did you try using distinct ports (9160 for thrift, 9042 for native) -
out of curiosity, not sure this will help.

What is your version of cqlsh "cqlsh --version" ?

doesn't work most times. But some time it just work fine
>

Do you fill like this is due to a timeout (query being too big, cluster
being to busy)? Try setting this higher:

--connect-timeout=CONNECT_TIMEOUT

Specify the connection timeout in seconds (default:
5 seconds).

  --request-timeout=REQUEST_TIMEOUT

Specify the default request timeout in seconds
(default:
10 seconds).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-18 4:49 GMT+01:00 joseph gao :

> Of course yes.
>
> 2016-03-17 22:35 GMT+08:00 Vishwas Gupta :
>
>> Have you started the Cassandra service?
>>
>> sh cassandra
>> On 17-Mar-2016 7:59 pm, "Alain RODRIGUEZ"  wrote:
>>
>>> Hi, did you try with the address of the node rather than 127.0.0.1
>>>
>>> Is the transport protocol used by cqlsh (not sure if it is thrift or
>>> binary - native in 2.1)  active ? What is the "nodetool info" output ?
>>>
>>> C*heers,
>>> ---
>>> Alain Rodriguez - al...@thelastpickle.com
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> 2016-03-17 14:26 GMT+01:00 joseph gao :
>>>
 hi, all
 cassandra version 2.1.7
 When I use cqlsh to connect cassandra, something is wrong

 Connection error: ( Unable to connect to any servers', {'127.0.0.1':
 OperationTimedOut('errors=None, last_host=None,)})

 This happens lots of times, but sometime it works just fine. Anybody
 knows why?

 --
 --
 Joseph Gao
 PhoneNum:15210513582
 QQ: 409343351

>>>
>>>
>
>
> --
> --
> Joseph Gao
> PhoneNum:15210513582
> QQ: 409343351
>


Re: Experiencing strange disconnect issue

2016-03-19 Thread Bo Finnerup Madsen
Hi Sean,

Thank you for taking the time to answer :)
We are using a very vanilla connection, without any sorts of tuning
policies. The cluster/session is constructed as follows:
final Cluster cluster = Cluster.builder()

.addContactPoints(key.getContactPoints())
.build();
final Session session =
cluster.connect(key.getKeyspace());
Perhaps it is too vanilla and we are missing something?

Since I posted the question, I have tried downgrading to cassandra v2.1.13
and java driver 2.1.9. But got the same error. So I suspect it is something
we are doing wrong.



ons. 16. mar. 2016 kl. 18.59 skrev :

> Are you using any of the Tuning Policies (
> https://docs.datastax.com/en/developer/java-driver/2.0/common/drivers/reference/tuningPolicies_c.html)?
> It could be that you are hitting some peak load and the driver is not
> retrying hosts once they are marked “down.”
>
>
>
>
>
> Sean Durity – Lead Cassandra Admin
>
> Big DATA Team
>
> For support, create a JIRA
> 
>
>
>
> *From:* Bo Finnerup Madsen [mailto:bo.gunder...@gmail.com]
> *Sent:* Tuesday, March 15, 2016 5:24 AM
> *To:* user@cassandra.apache.org
> *Subject:* Experiencing strange disconnect issue
>
>
>
> Hi,
>
>
>
> We are currently trying to convert an existing java web application to use
> cassandra, and while most of it works great :) we have a "small" issue.
>
>
>
> After some time, we all connectivity seems to be lost and we get the
> following errors:
>
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
> tried for query failed (tried: /10.61.70.107:9042
> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.107]
> Connection has been closed), /10.61.70.108:9042
> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.108]
> Connection has been closed))
>
>
>
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
> tried for query failed (tried: /10.61.70.107:9042
> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
> to acquire available connection (you may want to increase the driver number
> of per-host connections)), /10.61.70.108:9042
> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.108]
> Connection has been closed), /10.61.70.110:9042
> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.110]
> Connection has been closed))
>
>
>
> The errors persists, and the application needs to be restarted to recover.
>
>
>
> At application startup we create a cluster and a session which we reuse
> through out the application as pr. the documentation. We don't specify any
> other options when connecting than the IP's of the three servers. We are
> running cassandra 3.0.3 tar ball in EC2 in a cluster of three machines. The
> connections are made using v3.0.0 java driver.
>
>
>
> I have uploaded the configuration and logs from our cassandra cluster
> here: https://gist.github.com/anonymous/452e736b401317b5b38d
>
> The issue happend at 00:44:46.
>
>
>
> I would greatly appreciate any ideas as to what we are doing wrong to
> experience this? :)
>
>
>
> Thank you in advance!
>
>
>
> Yours sincerely,
>
>   Bo Madsen
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


Re: How can I make Cassandra stable in a 2GB RAM node environment ?

2016-03-19 Thread Alain RODRIGUEZ
Hi, I am not sure I understood your message correctly but I will try to
answer it.

but, I think, in Cassandra case, it seems a matter of how much data we use
> with how much memory we have.


If you are saying you can use poor commodity servers (vertically scale
poorly) and just add nodes (horizontal scaling) when the cluster is not
powerful enough, you need to know that a minimum of vertical scaling is
needed to have great performances or a good stability. Yet, tuning things,
you can probably reach a stable state with t2.medium if there is enough
t2.medium to handle the load.

with default configuration except for leveledCompactionStrategy


LeveledCompactionStrategy is heavier to maintain than STCS. On such an
environment, read latency is probably not  your main concern, and using
STCS could give better results as it is way lighter in terms of compactions
(Depends on your use case though).

I also used 4GM RAM machine (t2.medium)
>

With 4GB of RAM you probably want to use 1 GB of heap. What version of
cassandra are you using ?
You might also need to tune bloomfilters, index_interval, memtables size
and type, and a few other things to reduce the memory footprint.

About compaction, use only half of the cores as concurrent compactors (one
core) and see if this improves stability and compaction can still keep up.
Or keep 2 and reduce it speed by lowering the compaction throughput.

Use nodetool {tpstats, compactionstats, cfstats, cfhistograms} to monitor
things and see what to tune.

As told earlier, using this low spec machines is fine if you know how to
tune Cassandra and can afford some research / tuning time...

Alain
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-12 6:58 GMT+01:00 Hiroyuki Yamada :

> Thank you all to respond and discuss my question.
>
> I agree with you all basically,
> but, I think, in Cassandra case, it seems a matter of how much data we use
> with how much memory we have.
>
> As Jack's (and datastax's) suggestion,
> I also used 4GM RAM machine (t2.medium) with 1 billion records (about
> 100GB in size) with default configuration except for
> leveledCompactionStrategy,
> but after completion of insertion from an application program, probably
> compaction kept working,
> and again, later Cassandra was killed by OOM killer.
>
> Insertion from application side is finished, so the issue is maybe from
> compaction happening in background.
> Is there any recommended configuration in compaction to make Cassandra
> stable with large dataset (more than 100GB) with kind of low memory (4GB)
> environment ?
>
> I think it would be the same thing if I try the experiment with 8GB memory
> and larger data set (maybe more than 2 billion records).
> (If it is not correct, please explain why.)
>
>
> Best regards,
> Hiro
>
> On Fri, Mar 11, 2016 at 4:19 AM, Robert Coli  wrote:
>
>> On Thu, Mar 10, 2016 at 3:27 AM, Alain RODRIGUEZ 
>> wrote:
>>
>>> So, like Jack, I globally really not recommend it unless you know what
>>> you are doing and don't care about facing those issues.
>>>
>>
>> Certainly a spectrum of views here, but everyone (including OP) seems to
>> agree with the above. :D
>>
>> =Rob
>>
>>
>
>


Parallel bootstraps in two DCs with NetworkTopologyStrategy

2016-03-19 Thread Gabriel Wicke
Hi,

we are in the process of expanding a multi-DC cluster, and are
wondering if it is safe to bootstrap one node per DC in parallel. My
intuition would be that this should not lead to any token range
overlaps (similar to bootstrapping multiple nodes in a rack), so
*should* be safe. We are using vnodes.

Could anybody shed some light on this?

Thanks,

Gabriel


Re: Data modelling, including cleanup

2016-03-19 Thread Hannu Kröger
Hi,

That’s how I have done it in many occasions. Nowadays there is the possibility 
use Cassandra 3.0 and materialised views so that you don’t need to keep two 
tables up to date manually:
http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views 


Hannu

> On 17 Mar 2016, at 12:05, Bo Finnerup Madsen  wrote:
> 
> Hi,
> 
> We are pretty new to data modelling in cassandra, and are having a bit of a 
> challenge creating a model that caters both for queries and updates.
> 
> Let me try to explain it using the users example from 
> http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling 
> 
> 
> They define two tables used for reading users, one by username and one by 
> email.
> -
> CREATE TABLE users_by_username (
> username text PRIMARY KEY,
> email text,
> age int
> )
>  
> CREATE TABLE users_by_email (
> email text PRIMARY KEY,
> username text,
> age int
> )
> -
> 
> Now lets pretend that we need to delete a user, and we are given a username 
> as a key. Would the correct procedure be:
> 1) Read the email from users_by_username using the username as a key
> 2) Delete from users_by_username using the username as a key
> 3) Delete from users_by_email using the email as a key
> 
> Or is there a smarter way of doing this?
> 
> Yours sincerely,
>   Bo Madsen
> 



Re: Experiencing strange disconnect issue

2016-03-19 Thread Bo Finnerup Madsen
Hi,

I ran another test with the following client setup:
final Cluster cluster = Cluster.builder()
  .addContactPoints(key.getContactPoints())
  .withSocketOptions(new SocketOptions().setKeepAlive(true))
  .withReconnectionPolicy(Policies.defaultReconnectionPolicy())
  .withRetryPolicy(Policies.defaultRetryPolicy())
  .build();

But unfortunately the same issue happened.
I have uploaded the cassandra driver debug log here:
https://gist.github.com/anonymous/db6fa061298018c46954

It all goes south at about line 433 (time 19:46:43) where the client throws
a:
io.netty.handler.codec.DecoderException:
com.datastax.driver.core.exceptions.DriverInternalError: Adjusted frame
length exceeds 268435456: 462591744 - discarded

Any ideas?

ons. 16. mar. 2016 kl. 19.23 skrev Steve Robenalt :

> Hi Bo,
>
> I would suggest adding:
>
> .withReconnectionPolicy(new ExponentialReconnectionPolicy(1000,3))
>
> or something similar to your cluster builder.
>
> Steve
>
>
> On Wed, Mar 16, 2016 at 11:18 AM, Bo Finnerup Madsen <
> bo.gunder...@gmail.com> wrote:
>
>> Hi Sean,
>>
>> Thank you for taking the time to answer :)
>> We are using a very vanilla connection, without any sorts of tuning
>> policies. The cluster/session is constructed as follows:
>> final Cluster cluster = Cluster.builder()
>>
>> .addContactPoints(key.getContactPoints())
>> .build();
>> final Session session =
>> cluster.connect(key.getKeyspace());
>> Perhaps it is too vanilla and we are missing something?
>>
>> Since I posted the question, I have tried downgrading to cassandra
>> v2.1.13 and java driver 2.1.9. But got the same error. So I suspect it is
>> something we are doing wrong.
>>
>>
>>
>> ons. 16. mar. 2016 kl. 18.59 skrev :
>>
>>> Are you using any of the Tuning Policies (
>>> https://docs.datastax.com/en/developer/java-driver/2.0/common/drivers/reference/tuningPolicies_c.html)?
>>> It could be that you are hitting some peak load and the driver is not
>>> retrying hosts once they are marked “down.”
>>>
>>>
>>>
>>>
>>>
>>> Sean Durity – Lead Cassandra Admin
>>>
>>> Big DATA Team
>>>
>>> For support, create a JIRA
>>> 
>>>
>>>
>>>
>>> *From:* Bo Finnerup Madsen [mailto:bo.gunder...@gmail.com]
>>> *Sent:* Tuesday, March 15, 2016 5:24 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Experiencing strange disconnect issue
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> We are currently trying to convert an existing java web application to
>>> use cassandra, and while most of it works great :) we have a "small" issue.
>>>
>>>
>>>
>>> After some time, we all connectivity seems to be lost and we get the
>>> following errors:
>>>
>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All
>>> host(s) tried for query failed (tried: /10.61.70.107:9042
>>> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.107]
>>> Connection has been closed), /10.61.70.108:9042
>>> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.108]
>>> Connection has been closed))
>>>
>>>
>>>
>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All
>>> host(s) tried for query failed (tried: /10.61.70.107:9042
>>> (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
>>> to acquire available connection (you may want to increase the driver number
>>> of per-host connections)), /10.61.70.108:9042
>>> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.108]
>>> Connection has been closed), /10.61.70.110:9042
>>> (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.110]
>>> Connection has been closed))
>>>
>>>
>>>
>>> The errors persists, and the application needs to be restarted to
>>> recover.
>>>
>>>
>>>
>>> At application startup we create a cluster and a session which we reuse
>>> through out the application as pr. the documentation. We don't specify any
>>> other options when connecting than the IP's of the three servers. We are
>>> running cassandra 3.0.3 tar ball in EC2 in a cluster of three machines. The
>>> connections are made using v3.0.0 java driver.
>>>
>>>
>>>
>>> I have uploaded the configuration and logs from our cassandra cluster
>>> here: https://gist.github.com/anonymous/452e736b401317b5b38d
>>>
>>> The issue happend at 00:44:46.
>>>
>>>
>>>
>>> I would greatly appreciate any ideas as to what we are doing wrong to
>>> experience this? :)
>>>
>>>
>>>
>>> Thank you in advance!
>>>
>>>
>>>
>>> Yours sincerely,
>>>
>>>   Bo Madsen
>>>
>>> --
>>>
>>> The information in this Internet Email is confidential and may be
>>> legally privileged. It is intended solely for the addressee. Access to this
>>> Email by anyone else is unauthorized. If you are not the intended
>>> recipient, any disclosure, copying, 

Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Jack Krupansky
Stratio (or DSE Search) should be good for ad hoc or complex queries, but
if there are some fixed/common query patterns you might be better off
implementing query tables or using materialized views. The latter allows
you to include a non-PK data column in the PK of the MV so that you can
directly access the indexed row without the complexity of Lucene/DSE. This
also lets you effectively cluster data that will be commonly accessed
together on a single node/partition, and to do it automatically without any
application logic to manually duplicate/update data.

(3.x still has the restriction that an MV PK can only include one non-PK
data column - CASSANDRA-9928
.)

-- Jack Krupansky

On Wed, Mar 16, 2016 at 4:40 PM, I PVP  wrote:

> Jack/Tom
> Thanks for answering.
>
> Here is the table definition so far:
>
> CREATE TABLE audit_trail (
> auditid timeuuid,
> actiontype text,
> objecttype text,
> executedby uuid ( or timeuuid?),
> executedat timestamp,
> objectbefore text,
> objectafter text,
> clientipaddr text,
> serveripaddr text,
> servername text,
> channel text,
> PRIMARY KEY (auditid)
> );
>
> objectbefore/after are the only ones that will have JSON content. quering
> based on the contents of these two  columns are not a requirement.
>
> At this moment the queries are going to be mainly on executedby ( the
> employee id).
> Stratio’s Cassandra Lucene Index will be used to allow querying/filtering
> on executedat (timestamp) ,objecttype(order, customer, ticket,
> message,account, paymenttransaction,refund etc.)  and actiontype(create,
> retrieve, update, delete, approve, activate, unlock, lock etc.) .
>
> I am considering to count exclusively on Stratio’s Cassandra Lucene
>  filtering and avoid to add  “period” columns like month(int), year(int),
> day (int).
>
> Thanks
>
> --
> IPVP
>
>
> From: Jack Krupansky  
> Reply: user@cassandra.apache.org >
> 
> Date: March 16, 2016 at 5:22:36 PM
> To: user@cassandra.apache.org >
> 
> Subject:  Re: Modeling Audit Trail on Cassandra
>
> executedby is the ID assigned to an employee.
>
> I'm presuming that JSON is to be used for objectbefore/after. This
> suggests no ability to query by individual object fields. I didn't sense
> any other columns that would be JSON.
>
>
>
> -- Jack Krupansky
>
> On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge 
> wrote:
>
>> Is text the most appropriate data type to store JSON that contain couple
>>> of dozen lines ?
>>>
>>
>> It sure is the simplest way to store JSON.
>>
>> The query requirement  is  "where executedby = ?”.
>>>
>>
>> Since executedby is a timeuuid, I guess you don't want to query a single
>> record, since that would require you to know the exact timeuuid. Do you
>> mean that you would like to query all changes in a certain time frame, e.g.
>> today? In that case, you would have to group your rows in time buckets,
>> e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any
>> other period that suits your situation. Retrieving all changes in a
>> specific time frame is done by retrieving all relevant periods.
>>
>> Tom
>>
>
>


RE: Experiencing strange disconnect issue

2016-03-19 Thread SEAN_R_DURITY
Are you using any of the Tuning Policies 
(https://docs.datastax.com/en/developer/java-driver/2.0/common/drivers/reference/tuningPolicies_c.html)?
 It could be that you are hitting some peak load and the driver is not retrying 
hosts once they are marked “down.”


Sean Durity – Lead Cassandra Admin
Big DATA Team
For support, create a 
JIRA

From: Bo Finnerup Madsen [mailto:bo.gunder...@gmail.com]
Sent: Tuesday, March 15, 2016 5:24 AM
To: user@cassandra.apache.org
Subject: Experiencing strange disconnect issue

Hi,

We are currently trying to convert an existing java web application to use 
cassandra, and while most of it works great :) we have a "small" issue.

After some time, we all connectivity seems to be lost and we get the following 
errors:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried 
for query failed (tried: /10.61.70.107:9042 
(com.datastax.driver.core.exceptions.TransportException: 
[/10.61.70.107] Connection has been closed), 
/10.61.70.108:9042 
(com.datastax.driver.core.exceptions.TransportException: 
[/10.61.70.108] Connection has been closed))

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried 
for query failed (tried: /10.61.70.107:9042 
(com.datastax.driver.core.exceptions.DriverException: Timeout while trying to 
acquire available connection (you may want to increase the driver number of 
per-host connections)), /10.61.70.108:9042 
(com.datastax.driver.core.exceptions.TransportException: 
[/10.61.70.108] Connection has been closed), 
/10.61.70.110:9042 
(com.datastax.driver.core.exceptions.TransportException: 
[/10.61.70.110] Connection has been closed))

The errors persists, and the application needs to be restarted to recover.

At application startup we create a cluster and a session which we reuse through 
out the application as pr. the documentation. We don't specify any other 
options when connecting than the IP's of the three servers. We are running 
cassandra 3.0.3 tar ball in EC2 in a cluster of three machines. The connections 
are made using v3.0.0 java driver.

I have uploaded the configuration and logs from our cassandra cluster here: 
https://gist.github.com/anonymous/452e736b401317b5b38d
The issue happend at 00:44:46.

I would greatly appreciate any ideas as to what we are doing wrong to 
experience this? :)

Thank you in advance!

Yours sincerely,
  Bo Madsen



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Getting Issue while setting up Cassandra in Windows 8.1

2016-03-19 Thread Paulo Motta
I don't see any apparent problem there, your cassandra process is running
and ready to receive requests, but it's running on the foreground
apparently.

If you want to run C* as a service, you might be better off installing the
DataStax Distribution of Apache Cassandra for Windows, which installs
cassandra as a background windows service. You can find more info here:
http://docs.datastax.com/en/cassandra_win/3.x/cassandra/install/installWin.html

2016-03-18 11:41 GMT-03:00 Bhupendra Baraiya <
bhupendra.bara...@continuum.net>:

> Hi ,
>
>
>
>   I tried install Cassandra using above attached  steps
>
> But then I got below error
>
>
>
>
>
>
>
>
>
> When I have send  this to 'd...@cassandra.apache.org' I got below reply
>
>
>
> You should use the user@cassandra.apache.org list for cassandra-related
> questions, not this (d...@cassandra.apache.org) which is exclusive for
> internal Cassandra development. You can register to the user list by
> sending an e-mail to: user-subscr...@cassandra.apache.org
>
>
>
> Answering your question, it seems your %PATH% variable is broken, for some
> reason you removed the default
> "C:\Windows\System32\WindowsPowerShell\v1.0\" entry, so powershell is not
> found to run properly configured version of Cassandra, so you must fix
> that. If you still want to run with legacy settings (not recommended), you
> must adjust your heap settings (-Xms, -Xmx) on the cassandra.bat script to
> fit your system's memory.
>
>
>
> So I gone to Cassandra.bat and changed Xms and Xmn to 1G and it worked but
> from last 30 minutes its showing me below progress is it natural or
> something wrong has happend
>
>
>
>
>
>
>
>
>
> Thanks and regards,
>
>
>
> *Bhupendra Baraiya*
>
> Continuum Managed Services, LLC.
>
> p: 902-933-0019
>
> e: bhupendra.bara...@continuum.net
>
> w: continuum.net
>
> [image:
> http://cdn2.hubspot.net/hub/281750/file-393087232-png/img/logos/email-continuum-logo-151x26.png]
> 
>
>
>


Re: Questions about Datastax support

2016-03-19 Thread Jack Krupansky
1. They have a published support policy:
http://www.datastax.com/support-policy/supported-software


-- Jack Krupansky

On Thu, Mar 17, 2016 at 10:09 AM, Rakesh Kumar 
wrote:

> Few questions:
>
> 1 - Has there been an announcement as to when Datastax will stop
>   supporting 2.x version. I am aware that the community will stop
>   supporting 2.x in Nov 2016. What about support to
>   paid customers of Datastax. Will it go beyond Nov.
> 2 -  Are there any plans by Datastax to start supporting 3.x.
> 3 -  Is version 3.x recommended for production use.
> 4 -  What about compatibility of 3.x with the Datastax Development and
>monitoring tools. Like currently opcenter does not work with 3.x.
>
> thanks
>


Questions about Datastax support

2016-03-19 Thread Rakesh Kumar
Few questions:

1 - Has there been an announcement as to when Datastax will stop
  supporting 2.x version. I am aware that the community will stop
  supporting 2.x in Nov 2016. What about support to
  paid customers of Datastax. Will it go beyond Nov.
2 -  Are there any plans by Datastax to start supporting 3.x.
3 -  Is version 3.x recommended for production use.
4 -  What about compatibility of 3.x with the Datastax Development and
   monitoring tools. Like currently opcenter does not work with 3.x.

thanks


DTCS bucketing Question

2016-03-19 Thread Anubhav Kale


Hello,

I am trying to concretely understand how DTCS makes buckets and I am looking at 
the DateTieredCompactionStrategyTest.testGetBuckets method and played with some 
of the parameters to GetBuckets method call (Cassandra 2.1.12).

I don't think I fully understand something there. Let me try to explain.

Consider the second test there. I changed the pairs a bit for easier 
explanation and changed base (initial window size)=1000L and Min_Threshold=2

pairs = Lists.newArrayList(
Pair.create("a", 200L),
Pair.create("b", 2000L),
Pair.create("c", 3600L),
Pair.create("d", 3899L),
Pair.create("e", 3900L),
Pair.create("f", 3950L),
Pair.create("too new", 4125L)
);
buckets = getBuckets(pairs, 1000L, 2, 4050L, Long.MAX_VALUE);

In this case, the buckets should look like [0-4000] [4000-]. Is this correct ? 
The buckets that I get back are different ("a" lives in its bucket and everyone 
else in another). What I am missing here ?

Another case,

pairs = Lists.newArrayList(
Pair.create("a", 200L),
Pair.create("b", 2000L),
Pair.create("c", 3600L),
Pair.create("d", 3899L),
Pair.create("e", 3900L),
Pair.create("f", 3950L),
Pair.create("too new", 4125L)
);
buckets = getBuckets(pairs, 50L, 4, 4050L, Long.MAX_VALUE);

Here, the buckets should be [0-3200] [3200-4000] [4000-4050] [4050-]. Is this 
correct ? Again, the buckets that come back are quite different.

Note, that if I keep the base to original (100L) or increase it and play with 
min_threshold the results are exactly what I would expect.

The way I think about DTCS is, try to make buckets of maximum possible sizes 
from 0, and once you can't make do that , make smaller buckets (similar to what 
the comment suggests). Is this mental model wrong ? I am afraid that the math 
in Target class is somewhat hard to follow so I am thinking about it this way.

Thanks a lot in advance.

-Anubhav


Re: Experiencing strange disconnect issue

2016-03-19 Thread Steve Robenalt
Hi Bo,

You might try sending the same question to the java driver mailing list. I
haven't seen your particular error in several years of running Cassandra on
AWS. The closest I saw in the past was due to a protocol error in the
driver during the 2.0 beta timeframe.

Steve

On Wed, Mar 16, 2016 at 1:30 PM, Bo Finnerup Madsen 
wrote:

> Hi,
>
> I ran another test with the following client setup:
> final Cluster cluster = Cluster.builder()
>   .addContactPoints(key.getContactPoints())
>   .withSocketOptions(new SocketOptions().setKeepAlive(true))
>   .withReconnectionPolicy(Policies.defaultReconnectionPolicy())
>   .withRetryPolicy(Policies.defaultRetryPolicy())
>   .build();
>
> But unfortunately the same issue happened.
> I have uploaded the cassandra driver debug log here:
> https://gist.github.com/anonymous/db6fa061298018c46954
>
> It all goes south at about line 433 (time 19:46:43) where the client
> throws a:
> io.netty.handler.codec.DecoderException:
> com.datastax.driver.core.exceptions.DriverInternalError: Adjusted frame
> length exceeds 268435456: 462591744 - discarded
>
> Any ideas?
>
> ons. 16. mar. 2016 kl. 19.23 skrev Steve Robenalt  >:
>
>> Hi Bo,
>>
>> I would suggest adding:
>>
>> .withReconnectionPolicy(new ExponentialReconnectionPolicy(1000,3))
>>
>> or something similar to your cluster builder.
>>
>> Steve
>>
>>
>> On Wed, Mar 16, 2016 at 11:18 AM, Bo Finnerup Madsen <
>> bo.gunder...@gmail.com> wrote:
>>
>>> Hi Sean,
>>>
>>> Thank you for taking the time to answer :)
>>> We are using a very vanilla connection, without any sorts of tuning
>>> policies. The cluster/session is constructed as follows:
>>> final Cluster cluster = Cluster.builder()
>>>
>>> .addContactPoints(key.getContactPoints())
>>> .build();
>>> final Session session =
>>> cluster.connect(key.getKeyspace());
>>> Perhaps it is too vanilla and we are missing something?
>>>
>>> Since I posted the question, I have tried downgrading to cassandra
>>> v2.1.13 and java driver 2.1.9. But got the same error. So I suspect it is
>>> something we are doing wrong.
>>>
>>>
>>>
>>> ons. 16. mar. 2016 kl. 18.59 skrev :
>>>
 Are you using any of the Tuning Policies (
 https://docs.datastax.com/en/developer/java-driver/2.0/common/drivers/reference/tuningPolicies_c.html)?
 It could be that you are hitting some peak load and the driver is not
 retrying hosts once they are marked “down.”





 Sean Durity – Lead Cassandra Admin

 Big DATA Team

 For support, create a JIRA
 



 *From:* Bo Finnerup Madsen [mailto:bo.gunder...@gmail.com]
 *Sent:* Tuesday, March 15, 2016 5:24 AM
 *To:* user@cassandra.apache.org
 *Subject:* Experiencing strange disconnect issue



 Hi,



 We are currently trying to convert an existing java web application to
 use cassandra, and while most of it works great :) we have a "small" issue.



 After some time, we all connectivity seems to be lost and we get the
 following errors:

 com.datastax.driver.core.exceptions.NoHostAvailableException: All
 host(s) tried for query failed (tried: /10.61.70.107:9042
 (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.107]
 Connection has been closed), /10.61.70.108:9042
 (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.108]
 Connection has been closed))



 com.datastax.driver.core.exceptions.NoHostAvailableException: All
 host(s) tried for query failed (tried: /10.61.70.107:9042
 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying
 to acquire available connection (you may want to increase the driver number
 of per-host connections)), /10.61.70.108:9042
 (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.108]
 Connection has been closed), /10.61.70.110:9042
 (com.datastax.driver.core.exceptions.TransportException: [/10.61.70.110]
 Connection has been closed))



 The errors persists, and the application needs to be restarted to
 recover.



 At application startup we create a cluster and a session which we reuse
 through out the application as pr. the documentation. We don't specify any
 other options when connecting than the IP's of the three servers. We are
 running cassandra 3.0.3 tar ball in EC2 in a cluster of three machines. The
 connections are made using v3.0.0 java driver.



 I have uploaded the configuration and logs from our cassandra cluster
 here: https://gist.github.com/anonymous/452e736b401317b5b38d

 The issue happend at 

Re: discrepancy in up nodes from different nodes

2016-03-19 Thread Alain RODRIGUEZ
Hi Surbhi.

No idea that come to mind directly... Could you provide the cassandra
version, a view of your keyspaces replication, some example of your
cassandra-rackdc.properties. Also we could use some "nodetool status ks"
outputs.

Did you make sure GPFS was exactly identical to simple snitch before
activating it (1 DC / 1 Rack) configured cluster wide ? If not, you are
probably missing some data around.

Please provide more information, I will keep an eye and try to help you
with that.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-16 19:24 GMT+01:00 Surbhi Gupta :

> Hi,
>
> I have changed endpoint_snitch from Simple to GossipingPropertyFileSnitch.
> And changed the cassandra-rackdc.properties file to reflect the correct DC
> and RACK.
>
> However when i did rolling restart then one node is showing 15 nodes up,
> otehr node is showing 10 nodes up etc.
>
> I have done rolling restart multiple time but number are changing but it
> is not consistent.
>
> Anything i missed it.
>
> Thanks
> Surbhi
>


Re: What does FileCacheService's log message (invalidating cache) mean?

2016-03-19 Thread Satoshi Hikida
Thank you for your very useful advice!


Definitely, I'm using Cassandra V2.2.5 not 3.x. And basically I've
understood what does these logs mean. But I have more a few questions. So I
would very much appreciate If I get some explanations about these questions.

* Q1.
In my understand, when open a SSTable, a lot of RandomAccessReaders(RARs)
are created. A number of RARs is equal to a number of segments of SSTable.
Is a number of segments(=RARs) equal to follows?

a number of segments = size of SSTable / size of segments

* Q2.
What is happen if the Cassandra open a SSTable file which bigger than JVM
heap (or memory)?

* Q3.
In my case, there are a lot of invalidating messages for the same SSTable
file (e.g. at least 11 records for tmplink-la-8348-big-Data.db in my
previous post). In some cases, there are more than 600 invalidating
messages for the same file and these messages logged for a few hours. Would
that closing a big SSTable is the cause?

* Q4.
I saw "tmplink-xxx" or "tmp-xxx" files in the logs and also data
directories. Are these files temporary in compaction process?


Here is my experimental configurations.

- Cassandra node: An aws EC2 instance(t2.medium. 4GBRAM, 2vCPU)
- Cassandra version: 2.2.5
- inserted data size: about 100GB
- cassandra-env.sh: default
- cassandra.yaml
- compaction_throughput_mb_per_sec: 8 (or default)
- concurrent_compactors: 1
- sstable_preemptive_open_interval_in_mb: 25 (or default)
- memtable_flush_writers: 1


Regards,
Satoshi


On Wed, Mar 16, 2016 at 5:47 PM, Stefania Alborghetti <
stefania.alborghe...@datastax.com> wrote:

> Each sstable has one or more random access readers (one per segment for
> example) and FileCacheService is a cache for such readers. When an sstable
> is closed, the cache is invalidated. If no single reader of an sstable is
> used for at least 512 milliseconds, all readers are evicted. If the sstable
> is opened again, new reader(s) will be created and added to the cache again.
>
> FileCacheService was removed in cassandra 3.0 in favour of a pool of
> page-aligned buffers, and sharing the NIO file channels amongst the readers
> of an sstable, refer to CASSANDRA-8897
>  and CASSANDRA-8893
>  for more details.
>
> On Wed, Mar 16, 2016 at 3:30 PM, satoshi hikida 
> wrote:
>
>> Hi,
>>
>> I have been working on some experiments for Cassandra and found some log
>> messages as follows in debug.log.
>> I am not sure what it exactly is, so I would appreciate if someone gives
>> me some explanations about it.
>>
>> In my verification, a Cassandra node runs as a stand-alone server on
>> Amazon EC2 instance(t2.medium). And I insert 1 Billion records (about 100GB
>> data size) to a table from a client application (which runs on another
>> instance separated from Cassandra node). After insertion, Cassandra
>> continues it's I/O activities for (probably) compaction and keep logging
>> the messages as follows:
>>
>> ---
>> ...
>> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:25,170
>> FileCacheService.java:102 - Evicting cold readers for
>> /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/la-6-big-Data.db
>> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:31,780
>> FileCacheService.java:177 - Invalidating cache for
>> /var/lib/cassandra/data/test/user-3d988520e9e011e59d830f00df8833fa/tmplink-la-8348-big-Data.db
>> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:36,899
>> FileCacheService.java:177 - Invalidating cache for
>> /var/lib/cassandra/data/test/user-3d988520e9e011e59d830f00df8833fa/tmplink-la-8348-big-Data.db
>> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:42,187
>> FileCacheService.java:177 - Invalidating cache for
>> /var/lib/cassandra/data/test/user-3d988520e9e011e59d830f00df8833fa/tmplink-la-8348-big-Data.db
>> DEBUG [NonPeriodicTasks:1] 2016-03-16 09:59:47,308
>> FileCacheService.java:177 - Invalidating cache for
>> /var/lib/cassandra/data/test/user-3d988520e9e011e59d830f00df8833fa/tmplink-la-8348-big-Data.db
>> ...
>> ---
>>
>> I guess these messages are related to the compaction process and
>> FileCacheService was invalidating cache which associated with a SSTable
>> file. But I'm not sure what it does actually mean. When the cache is
>> invalidated? And What happens is after cache invalidation?
>>
>>
>> Regards,
>> Satoshi
>>
>
>
>
> --
>
>
> [image: datastax_logo.png] 
>
> Stefania Alborghetti
>
> Apache Cassandra Software Engineer
>
> |+852 6114 9265| stefania.alborghe...@datastax.com
>
>
>
>


Re: Compaction Filter in Cassandra

2016-03-19 Thread Dikang Gu
Fyi, this is the jira, https://issues.apache.org/jira/browse/CASSANDRA-11348
.

We can move the discussion to the jira if want.

On Thu, Mar 17, 2016 at 11:46 AM, Dikang Gu  wrote:

> Hi Eric,
>
> Thanks for sharing the information!
>
> We also mainly want to use it for trimming data, either by the time or the
> number of columns in a row. We haven't started the work yet, do you mind to
> share some patches? We'd love to try it and test it in our environment.
>
> Thanks.
>
> On Tue, Mar 15, 2016 at 9:36 PM, Eric Stevens  wrote:
>
>> We have been working on filtering compaction for a month or so (though we
>> call it deleting compaction, its implementation is as a filtering
>> compaction strategy).  The feature is nearing completion, and we have used
>> it successfully in a limited production capacity against DSE 4.8 series.
>>
>> Our use case is that our records are written anywhere between a month, up
>> to several years before they are scheduled for deletion.  Tombstones are
>> too expensive, as we have tables with hundreds of billions of rows.  In
>> addition, traditional TTLs don't work for us because our customers are
>> permitted to change their retention policy such that already-written
>> records should not be deleted if they increase their retention after the
>> record was written (or vice versa).
>>
>> We can clean up data more cheaply and more quickly with filtered
>> compaction than with tombstones and traditional compaction.  Our
>> implementation is a wrapper compaction strategy for another underlying
>> strategy, so that you can have the characteristics of whichever strategy
>> makes sense in terms of managing your SSTables, while interceding and
>> removing records during compaction (including cleaning up secondary
>> indexes) that otherwise would have survived into the new SSTable.
>>
>> We are hoping to contribute it back to the community, so if you'd be
>> interested in helping test it out, I'd love to hear from you.
>>
>> On Sat, Mar 12, 2016 at 5:12 AM Marcus Eriksson 
>> wrote:
>>
>>> We don't have anything like that, do you have a specific use case in
>>> mind?
>>>
>>> Could you create a JIRA ticket and we can discuss there?
>>>
>>> /Marcus
>>>
>>> On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu  wrote:
>>>
 Hello there,

 RocksDB has the feature called "Compaction Filter" to allow application
 to modify/delete a key-value during the background compaction.
 https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226

 I'm wondering is there a plan/value to add this into C* as well? Or is
 there already a similar thing in C*?

 Thanks

 --
 Dikang


>>>
>
>
> --
> Dikang
>
>


-- 
Dikang


Single node Solr FTs not working

2016-03-19 Thread Joseph Tech
Hi,

I had setup a single-node DSE 4.8.x to start in Search mode to explore some
aspects of Solr search with field transformers (FT). Even though the
configuration seems fine and Solr admin shows the indexed data, and
searches on the actual fields (stored=true) work fine, but the FTs are not
being invoked during the indexing and the search using fields managed by
the FT don't work , i.e the evaluate(), addFieldToDocument() etc are not
invoked. There are no ERRORs or similar indications in system.log, and
solrvalidation.log is not having any entries too.

The only warnings are during node startup for the non-stored fields like xyz

WARN  [SolrSecondaryIndex checkout.cart index initializer.] 2016-03-16
17:24:57,956  CassandraIndexSchema.java:537 - No Cassandra column found for
field: xyz

The FT configuration was verified by changing the FT's class name in
solrconfig.xml and it threw a ClassNotFoundException, which didnt appear
with the right classname was given.

The data is being inserted and retrieved from the same node. Please suggest
any pointers to debug this.

Thanks,
Joseph


Re: Compaction Filter in Cassandra

2016-03-19 Thread Clint Martin
I would definitely be interested in this.

Clint
On Mar 15, 2016 9:36 PM, "Eric Stevens"  wrote:

> We have been working on filtering compaction for a month or so (though we
> call it deleting compaction, its implementation is as a filtering
> compaction strategy).  The feature is nearing completion, and we have used
> it successfully in a limited production capacity against DSE 4.8 series.
>
> Our use case is that our records are written anywhere between a month, up
> to several years before they are scheduled for deletion.  Tombstones are
> too expensive, as we have tables with hundreds of billions of rows.  In
> addition, traditional TTLs don't work for us because our customers are
> permitted to change their retention policy such that already-written
> records should not be deleted if they increase their retention after the
> record was written (or vice versa).
>
> We can clean up data more cheaply and more quickly with filtered
> compaction than with tombstones and traditional compaction.  Our
> implementation is a wrapper compaction strategy for another underlying
> strategy, so that you can have the characteristics of whichever strategy
> makes sense in terms of managing your SSTables, while interceding and
> removing records during compaction (including cleaning up secondary
> indexes) that otherwise would have survived into the new SSTable.
>
> We are hoping to contribute it back to the community, so if you'd be
> interested in helping test it out, I'd love to hear from you.
>
> On Sat, Mar 12, 2016 at 5:12 AM Marcus Eriksson  wrote:
>
>> We don't have anything like that, do you have a specific use case in mind?
>>
>> Could you create a JIRA ticket and we can discuss there?
>>
>> /Marcus
>>
>> On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu  wrote:
>>
>>> Hello there,
>>>
>>> RocksDB has the feature called "Compaction Filter" to allow application
>>> to modify/delete a key-value during the background compaction.
>>> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>>>
>>> I'm wondering is there a plan/value to add this into C* as well? Or is
>>> there already a similar thing in C*?
>>>
>>> Thanks
>>>
>>> --
>>> Dikang
>>>
>>>
>>


Re: Read consistency

2016-03-19 Thread Robert Coli
On Tue, Mar 15, 2016 at 6:43 PM, Arko Provo Mukherjee <
arkoprovomukher...@gmail.com> wrote:

> I am designing a system where for a situation, I need to have SERIAL
> consistency during writes.
>

Be sure to understand the implications of :

https://issues.apache.org/jira/browse/CASSANDRA-9328

=Rob


Data modelling, including cleanup

2016-03-19 Thread Bo Finnerup Madsen
Hi,

We are pretty new to data modelling in cassandra, and are having a bit of a
challenge creating a model that caters both for queries and updates.

Let me try to explain it using the users example from
http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling

They define two tables used for reading users, one by username and one by
email.
-
CREATE TABLE users_by_username (
username text PRIMARY KEY,
email text,
age int
)

CREATE TABLE users_by_email (
email text PRIMARY KEY,
username text,
age int
)
-

Now lets pretend that we need to delete a user, and we are given a username
as a key. Would the correct procedure be:
1) Read the email from users_by_username using the username as a key
2) Delete from users_by_username using the username as a key
3) Delete from users_by_email using the email as a key

Or is there a smarter way of doing this?

Yours sincerely,
  Bo Madsen


Re: Compaction Filter in Cassandra

2016-03-19 Thread Dikang Gu
Hi Eric,

Thanks for sharing the information!

We also mainly want to use it for trimming data, either by the time or the
number of columns in a row. We haven't started the work yet, do you mind to
share some patches? We'd love to try it and test it in our environment.

Thanks.

On Tue, Mar 15, 2016 at 9:36 PM, Eric Stevens  wrote:

> We have been working on filtering compaction for a month or so (though we
> call it deleting compaction, its implementation is as a filtering
> compaction strategy).  The feature is nearing completion, and we have used
> it successfully in a limited production capacity against DSE 4.8 series.
>
> Our use case is that our records are written anywhere between a month, up
> to several years before they are scheduled for deletion.  Tombstones are
> too expensive, as we have tables with hundreds of billions of rows.  In
> addition, traditional TTLs don't work for us because our customers are
> permitted to change their retention policy such that already-written
> records should not be deleted if they increase their retention after the
> record was written (or vice versa).
>
> We can clean up data more cheaply and more quickly with filtered
> compaction than with tombstones and traditional compaction.  Our
> implementation is a wrapper compaction strategy for another underlying
> strategy, so that you can have the characteristics of whichever strategy
> makes sense in terms of managing your SSTables, while interceding and
> removing records during compaction (including cleaning up secondary
> indexes) that otherwise would have survived into the new SSTable.
>
> We are hoping to contribute it back to the community, so if you'd be
> interested in helping test it out, I'd love to hear from you.
>
> On Sat, Mar 12, 2016 at 5:12 AM Marcus Eriksson  wrote:
>
>> We don't have anything like that, do you have a specific use case in mind?
>>
>> Could you create a JIRA ticket and we can discuss there?
>>
>> /Marcus
>>
>> On Sat, Mar 12, 2016 at 7:05 AM, Dikang Gu  wrote:
>>
>>> Hello there,
>>>
>>> RocksDB has the feature called "Compaction Filter" to allow application
>>> to modify/delete a key-value during the background compaction.
>>> https://github.com/facebook/rocksdb/blob/v4.1/include/rocksdb/options.h#L201-L226
>>>
>>> I'm wondering is there a plan/value to add this into C* as well? Or is
>>> there already a similar thing in C*?
>>>
>>> Thanks
>>>
>>> --
>>> Dikang
>>>
>>>
>>


-- 
Dikang


Re: Question about SELECT command

2016-03-19 Thread Jack Krupansky
Yes, gossip is how Cassandra knows which nodes are alive in the cluster.
But... that has nothing to do with SELECT. It's still not clear what you
are really getting at. I mean, if you have gone through the (free) online
training and (free) doc on Cassandra architecture, what is it you are still
trying to understand?

See:
https://docs.datastax.com/en/cassandra/3.x/cassandra/architecture/archIntro.html

Generally, your SELECT should be restricted to the data on a single node,
such as by specifying a specific partition key or a token range. The
partition key can be hashed to get a token value which can directly be
mapped to a node (or multiple nodes with replication.) Ad hoc, complex, and
expensive queries are anti-patterns in Cassandra (very discouraged if not
outright not supported.)

-- Jack Krupansky

On Thu, Mar 17, 2016 at 12:25 PM, Thouraya TH  wrote:

> Yes, i have tested that, but, i'd like to understand the architecture
> behind the command SELECT
> how it works ? it use gossip protocol to get live nodes ?
>
> Thank you for explanations.
> Kind regards.
>
>
>
> 2016-03-17 17:17 GMT+01:00 Carlos Alonso :
>
>> Yes, they could.
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 17 March 2016 at 16:10, Thouraya TH  wrote:
>>
>>> Hi all;
>>>
>>> Please, i have a question about the architecure behind SELECT command.
>>> Given this table:
>>>
>>>   c1   c2  c3
>>> value1 value2   value3
>>> ...
>>> 
>>> etc...
>>>
>>> lines of this table are distributed over nodes that's it ?
>>>
>>>
>>> Thank you so much for answers.
>>> Kind regards.
>>>
>>
>>
>


Question about SELECT command

2016-03-19 Thread Thouraya TH
Hi all;

Please, i have a question about the architecure behind SELECT command.
Given this table:

  c1   c2  c3
value1 value2   value3
...

etc...

lines of this table are distributed over nodes that's it ?


Thank you so much for answers.
Kind regards.


Re: Python to type field

2016-03-19 Thread Tyler Hobbs
This should be useful:
http://datastax.github.io/python-driver/user_defined_types.html

On Wed, Mar 16, 2016 at 1:18 PM, Rakesh Kumar 
wrote:

> Hi
>
> I have a type defined as follows
>
> CREATE TYPE etag (
> ttype int,
> tvalue text
> );
>
> And this is used in a col of a table as follows
>
>  evetag list  >
>
> I have the following value in a file
> [{ttype: 3 , tvalue: '90A1'}]
>
> This gets inserted via COPY command with no issues.
>
> However when I try to insert the same via a python program which I am
> writing. where I prepare and then bind, I get this error while executing
>
> TypeError: Received an argument of invalid type for column "evetag".
> Expected:  VarcharType))'>, Got: ; (Received a string for a type that
> expects a sequence)
>
> I tried casting the variable in python to list, tuple, but same error.
>
>
>


-- 
Tyler Hobbs
DataStax 


Re: What does FileCacheService's log message (invalidating cache) mean?

2016-03-19 Thread Stefania Alborghetti
Q1. Readers are created as needed, there is no fixed number. For example,
we may have 2 threads scanning sstables at the same time due to 2 different
CQL SELECT statements.

Q2. There is no correlation between sstable size and JVM HEAP size. We
don't load entire sstables in memory.

Q3. It's difficult to say what caused the invalidation messages, basically
anything that removed sstables from memory, such as dropping the table,
snapshots, compactions, streaming, there may me other operations I'm not
familiar with.

Q4. Correct, these are temporary files. Once again, in 3.0 things are
different and the temporary files have been replaced by transaction logs
(CASSANDRA-7066).


On Thu, Mar 17, 2016 at 3:40 PM, Satoshi Hikida 
wrote:

> Sorry there is a mistake in my previous post. I would correct it.
>
> In Q3, I mentioned there are a lot of invalidating messages in the
> debug.log. It is true but cassandra configurations were wrong. In that
> case, the cassandra.yaml configurations are as follows:
>
> - cassandra.yaml
> - compaction_throughput_mb_per_sec: 0 (not 8 or default)
> - concurrent_compactors: 1
> - sstable_preemptive_open_interval_in_mb: 0  (not 8 or default)
> - memtable_flush_writers: 1
>
> And More precisely, in that case, Cassandra keep on outputting
> invalidating messages for a while(a few hours). However CPU usage is almost
> 0.0% in top command like below.
>
> $ top -bu cassandra -n 1
> ...
> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND
> 2631 cassand+  20   0  0.250t 1.969g 703916 S   0.0 57.0   8459:35 java
>
> I want to know what was actually happening at that time.
>
>
> Regards,
> Satoshi
>
>
> On Thu, Mar 17, 2016 at 3:56 PM, Satoshi Hikida 
> wrote:
>
>> Thank you for your very useful advice!
>>
>>
>> Definitely, I'm using Cassandra V2.2.5 not 3.x. And basically I've
>> understood what does these logs mean. But I have more a few questions. So I
>> would very much appreciate If I get some explanations about these questions.
>>
>> * Q1.
>> In my understand, when open a SSTable, a lot of RandomAccessReaders(RARs)
>> are created. A number of RARs is equal to a number of segments of SSTable.
>> Is a number of segments(=RARs) equal to follows?
>>
>> a number of segments = size of SSTable / size of segments
>>
>> * Q2.
>> What is happen if the Cassandra open a SSTable file which bigger than JVM
>> heap (or memory)?
>>
>> * Q3.
>> In my case, there are a lot of invalidating messages for the same SSTable
>> file (e.g. at least 11 records for tmplink-la-8348-big-Data.db in my
>> previous post). In some cases, there are more than 600 invalidating
>> messages for the same file and these messages logged for a few hours. Would
>> that closing a big SSTable is the cause?
>>
>> * Q4.
>> I saw "tmplink-xxx" or "tmp-xxx" files in the logs and also data
>> directories. Are these files temporary in compaction process?
>>
>>
>> Here is my experimental configurations.
>>
>> - Cassandra node: An aws EC2 instance(t2.medium. 4GBRAM, 2vCPU)
>> - Cassandra version: 2.2.5
>> - inserted data size: about 100GB
>> - cassandra-env.sh: default
>> - cassandra.yaml
>> - compaction_throughput_mb_per_sec: 8 (or default)
>> - concurrent_compactors: 1
>> - sstable_preemptive_open_interval_in_mb: 25 (or default)
>> - memtable_flush_writers: 1
>>
>>
>> Regards,
>> Satoshi
>>
>>
>> On Wed, Mar 16, 2016 at 5:47 PM, Stefania Alborghetti <
>> stefania.alborghe...@datastax.com> wrote:
>>
>>> Each sstable has one or more random access readers (one per segment for
>>> example) and FileCacheService is a cache for such readers. When an sstable
>>> is closed, the cache is invalidated. If no single reader of an sstable is
>>> used for at least 512 milliseconds, all readers are evicted. If the sstable
>>> is opened again, new reader(s) will be created and added to the cache again.
>>>
>>> FileCacheService was removed in cassandra 3.0 in favour of a pool of
>>> page-aligned buffers, and sharing the NIO file channels amongst the readers
>>> of an sstable, refer to CASSANDRA-8897
>>>  and
>>> CASSANDRA-8893 
>>> for more details.
>>>
>>> On Wed, Mar 16, 2016 at 3:30 PM, satoshi hikida 
>>> wrote:
>>>
 Hi,

 I have been working on some experiments for Cassandra and found some
 log messages as follows in debug.log.
 I am not sure what it exactly is, so I would appreciate if someone
 gives me some explanations about it.

 In my verification, a Cassandra node runs as a stand-alone server on
 Amazon EC2 instance(t2.medium). And I insert 1 Billion records (about 100GB
 data size) to a table from a client application (which runs on another
 instance separated from Cassandra node). After insertion, Cassandra
 continues it's I/O activities for (probably) compaction and keep logging

Re: cqlsh problem

2016-03-19 Thread Vishwas Gupta
Have you started the Cassandra service?

sh cassandra
On 17-Mar-2016 7:59 pm, "Alain RODRIGUEZ"  wrote:

> Hi, did you try with the address of the node rather than 127.0.0.1
>
> Is the transport protocol used by cqlsh (not sure if it is thrift or
> binary - native in 2.1)  active ? What is the "nodetool info" output ?
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-03-17 14:26 GMT+01:00 joseph gao :
>
>> hi, all
>> cassandra version 2.1.7
>> When I use cqlsh to connect cassandra, something is wrong
>>
>> Connection error: ( Unable to connect to any servers', {'127.0.0.1':
>> OperationTimedOut('errors=None, last_host=None,)})
>>
>> This happens lots of times, but sometime it works just fine. Anybody
>> knows why?
>>
>> --
>> --
>> Joseph Gao
>> PhoneNum:15210513582
>> QQ: 409343351
>>
>
>