date:20170619

Large temporary files generated during cleaning up

2017-06-19 Thread wxn...@zjqunshuo.com

Hi,
Cleaning up is generating temporary files which are occupying large disk space. 
I noticed for every source sstable file, it is generating 4 temporary files, 
and two of them is almost as large as the source sstable file. If there are two 
concurrent cleaning tasks running, I have to leave the remaining disk space at 
least as two times large as the sum size of the two sstable files being cleaned 
up.
Is it expected? And do I have choice to do cleaning with less disk space?

Below is the temporary files generated during cleaning up:
-rw-r--r-- 2 root root 798M Jun 20 13:34 tmplink-lb-59516-big-Index.db
-rw-r--r-- 2 root root 798M Jun 20 13:34 tmp-lb-59516-big-Index.db
-rw-r--r-- 2 root root 219G Jun 20 13:34 tmplink-lb-59516-big-Data.db
-rw-r--r-- 2 root root 219G Jun 20 13:34 tmp-lb-59516-big-Data.db
-rw-r--r-- 2 root root 978M Jun 20 13:33 tmplink-lb-59517-big-Index.db
-rw-r--r-- 2 root root 978M Jun 20 13:33 tmp-lb-59517-big-Index.db
-rw-r--r-- 2 root root 245G Jun 20 13:34 tmplink-lb-59517-big-Data.db
-rw-r--r-- 2 root root 245G Jun 20 13:34 tmp-lb-59517-big-Data.db

Cheers,
-Simon

Re: Question: Behavior of inserting a list multiple times with same timestamp

2017-06-19 Thread Subroto Barua

here is the response from Datastax support/dev:

In a list each item is its own cell. Append adds a new cell sorted at basically 
"current server time uuid" prepend adds at "-current server time uuid". User 
supplied time stamps are used for the cell timestamp when specified.

Inserting the entire list deletes and then inserts

Reading reads out the entire list
Positional access reads the entire list and gets/puts at the spot specified

Basically, lists are not idempotent

On Monday, June 19, 2017, 6:55:40 AM PDT, Thakrar, Jayesh 
 wrote:

Subroto,

Cassandra docs say otherwise.

Writing list data is accomplished with a JSON-style syntax. To write a record 
using INSERT, specify the entire list as a JSON array. Note: An INSERT will 
always replace the entire list.

Maybe you can elaborate/shed some more light?

Thanks,
Jayesh

Lists

A list is a typed collection of non-unique values where elements are ordered by 
there position in the list. To create a column of type list, use the list 
keyword suffixed with the value type enclosed in angle brackets. For example:

CREATE TABLE plays (
    id text PRIMARY KEY,
    game text,
    players int,
    scores list
)
Do note that as explained below, lists have some limitations and performance 
considerations to take into account, and it is advised to prefer sets over 
lists when this is possible.

Writing list data is accomplished with a JSON-style syntax. To write a record 
using INSERT, specify the entire list as a JSON array. Note: An INSERT will 
always replace the entire list.

INSERT INTO plays (id, game, players, scores)
          VALUES ('123-afde', 'quake', 3, [17, 4, 2]);
Adding (appending or prepending) values to a list can be accomplished by adding 
a new JSON-style array to an existing list column.

UPDATE plays SET players = 5, scores = scores + [ 14, 21 ] WHERE id = 
'123-afde';
UPDATE plays SET players = 5, scores = [ 12 ] + scores WHERE id = '123-afde';
It should be noted that append and prepend are not idempotent operations. This 
means that if during an append or a prepend the operation timeout, it is not 
always safe to retry the operation (as this could result in the record appended 
or prepended twice).

Lists also provides the following operation: setting an element by its position 
in the list, removing an element by its position in the list and remove all the 
occurrence of a given value in the list. However, and contrarily to all the 
other collection operations, these three operations induce an internal read 
before the update, and will thus typically have slower performance 
characteristics. Those operations have the following syntax:

UPDATE plays SET scores[1] = 7 WHERE id = '123-afde';                // sets 
the 2nd element of scores to 7 (raises an error is scores has less than 2 
elements)
DELETE scores[1] FROM plays WHERE id = '123-afde';                  // deletes 
the 2nd element of scores (raises an error is scores has less than 2 elements)
UPDATE plays SET scores = scores - [ 12, 21 ] WHERE id = '123-afde'; // removes 
all occurrences of 12 and 21 from scores
As with maps, TTLs if used only apply to the newly inserted/updated values.

On 6/19/17, 1:12 AM, "Subroto Barua"  wrote:

    This is an expected behavior.

    We learned this issue/feature at the current site (we use Dse 5.08)

    Subroto 

    > On Jun 18, 2017, at 10:29 PM, Zhongxiang Zheng  
wrote:
    > 
    > Hi all,
    > 
    > I have a question about a behavior when insert a list with specifying 
timestamp.
    > 
    > It is documented that "An INSERT will always replace the entire list."
    > https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile#lists
    > 
    > However, When a list is inserted multiple times using same timestamp,
    > it will not be replaced, but will be added as follows.
    > 
    > cqlsh> CREATE TABLE test.test (k int PRIMARY KEY , v list);          

    > cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) USING TIMESTAMP 
1000 ;                                                                          

    > cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) USING TIMESTAMP 
1000 ;
    > cqlsh> SELECT * FROM test.test ;
    > 
    > k | v
    > ---+
    > 1 | [1, 1]
    > 
    > I confirmed this behavior is reproduced in 3.0.13 and 3.10.
    > I'd like to ask whether this behavior is a expected behavior or a bug?
    > 
    > In our use case, CQL statements with same values and timestamp will be 
issued multiple times
    > to retry inserting under the assumption that insert is idempotent.
    > So, I expect that the entire list will be replace even if insert a list 
multiple

Re: Cassandra is always CP or AP in terms of CAP theorem

2017-06-19 Thread Justin Cameron

This is achieved through a combination of replication factor (RF) and
consistency level (CL):

Replication factor is tied to your schema (more specifically, it is
configured at the keyspace level) and specifies how many copies of each
piece of data is kept.

Consistency level is associated (either explicitly or implicitly) with
queries (both reads and writes). It determines the number of replicas a
query should check (or wait for) when you execute a query on a given piece
of data.

In the simplest terms, a low consistency level will give you better
availability at the cost of potential data inconsistency, as fewer replicas
need to be online and available in order to satisfy the query, but if those
replicas are offline they may not receive the write, or may have a
different value that is not read.
Conversely, a high consistency level will give you better consistency at
the cost of potentially reduced availability.

There is much more in-depth description of this in the docs, I suggest you
read it:
http://cassandra.apache.org/doc/latest/architecture/dynamo.html?highlight=quorum#tunable-consistency

Cheers,
Justin

On Tue, 20 Jun 2017 at 12:47 Kaushal Shriyan 
wrote:

> Hi,
>
> I am reading the CAP theorem and Cassandra either satisfies CP or AP. I am
> not sure how do we take care of Availability property or Consistency
> property. Any examples to understand it better.
>
> Please help me understand if i am completely wrong?
>
> Thanks in Advance.
>
> Regards,
>
> Kaushal
>
-- 

*Justin Cameron*Senior Software Engineer

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Cassandra is always CP or AP in terms of CAP theorem

2017-06-19 Thread Kaushal Shriyan

Hi,

I am reading the CAP theorem and Cassandra either satisfies CP or AP. I am
not sure how do we take care of Availability property or Consistency
property. Any examples to understand it better.

Please help me understand if i am completely wrong?

Thanks in Advance.

Regards,

Kaushal

Re: CQL: fails to COPY FROM with null values

2017-06-19 Thread Stefania Alborghetti

It doesn't work because of the white space. By default the NULL value is an
empty string, extra white spaces are not trimmed automatically.

This should work:

ce98d62a-3666-4d3a-ae2f-df315ad448aa,Jonsson,Malcom,,2001-01-19
17:55:17+

You can change the string representing missing values with the NULL option
if you cannot remove spaces from your data.

On Mon, Jun 19, 2017 at 10:10 PM, Tobias Eriksson <
tobias.eriks...@qvantel.com> wrote:

> Hi
>
>  I am trying to copy a file of CSV data into a table
>
> But I get an error since sometimes one of the columns (which is a UUID) is
> empty
>
> Is this a bug or what am I missing?
>
>
>
> Here is how it looks like
>
> Table
>
> id uuid,
>
> lastname text,
>
> firstname text,
>
> address_id uuid,
>
> dateofbirth timestamp,
>
>
>
> PRIMARY KEY (id, lastname, firstname)
>
>
>
> COPY playground.individual(id,lastname,firstname,address_id) FROM
> ‘testfile.csv’;
>
>
>
> Where the testfile.csv is like this
>
>
>
> This works !!!
>
> ce98d62a-3666-4d3a-ae2f-df315ad448aa,Jonsson,Malcom
> ,c9dc8b60-d27f-430c-b960-782d854df3a5,2001-01-19 17:55:17+
>
>
>
> This does NOT work !!!
>
> ce98d62a-3666-4d3a-ae2f-df315ad448aa,Jonsson,Malcom , ,2001-01-19
> 17:55:17+
>
>
>
> Cause then I get the following error
>
> *Failed to import 1 rows: ParseError - badly formed hexadecimal UUID
> string,  given up without retries*
>
>
>
> So, how do I import my CSV file and set the columns which does not have a
> UUID to null ?
>
>
>
> -Tobias
>



-- 



STEFANIA ALBORGHETTI

Software engineer | +852 6114 9265 | stefania.alborghe...@datastax.com


[image: http://www.datastax.com/cloud-applications]

Re: Don't print Ping caused error logs

2017-06-19 Thread Eric Plowe

The driver had load balancing policies built in. Behind a load balancer
you'd lose the benefit things like the TokenAwarePolicy.
On Mon, Jun 19, 2017 at 3:49 PM Jonathan Haddad  wrote:

> The driver grabs all the cluster information from the nodes you provide
> the driver and connects automatically to the rest.  You don't need (and
> shouldn't use) a load balancer.
>
> Jon
>
> On Mon, Jun 19, 2017 at 12:28 PM Daniel Hölbling-Inzko <
> daniel.hoelbling-in...@bitmovin.com> wrote:
>
>> Just out of curiosity how to you then make sure all nodes get the same
>> amount of traffic from clients without having to maintain a manual contact
>> points list of all cassandra nodes in the client applications?
>> Especially with big C* deployments this sounds like a lot of work
>> whenever adding/removing nodes. Putting them behind a lb that can Auto
>> discover nodes (or manually adding them to the LB rotation etc) sounds like
>> a much easier way.
>> I am thinking mostly about cloud lb systems like AWS ELB or GCP LB
>>
>> Or can the client libraries discover nodes and use other contact points
>> für subsequent requests? Having a bunch of seed nodes would be easier I
>> guess.
>>
>> Greetings Daniel
>> Akhil Mehra  schrieb am Mo. 19. Juni 2017 um 11:44:
>>
>>> Just in case you are not aware using a load balancer is an anti patter.
>>> Please refer to (
>>> http://docs.datastax.com/en/landing_page/doc/landing_page/planning/planningAntiPatterns.html#planningAntiPatterns__AntiPatLoadBal
>>> )
>>>
>>> You can turnoff logging for a particular class using the nodetool
>>> setlogginglevel (
>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSetLogLev.html
>>> ).
>>>
>>> In your case you can try setting the log level for
>>> org.apache.cassandra.transport.Message to warn using the following command
>>>
>>> nodetool setlogginglevel org.apache.cassandra.transport.Message WARN
>>>
>>> Obviously this will suppress all info level logging in the message
>>> class.
>>>
>>> I hope that helps.
>>>
>>> Cheers,
>>> Akhil
>>>
>>>
>>>
>>>
>>> On 19/06/2017, at 9:13 PM, wxn...@zjqunshuo.com wrote:
>>>
>>> Hi,
>>> Our cluster nodes are behind a SLB(Service Load Balancer) with a VIP and
>>> the Cassandra client access the cluster by the VIP.
>>> System.log print the below IOException every several seconds. I guess
>>> it's the SLB service which Ping the port 9042 of the Cassandra node
>>> periodically and caused the exceptions print.
>>> Any method to prevend the Ping caused exceptions been print?
>>>
>>>
>>> INFO  [SharedPool-Worker-1] 2017-06-19 16:54:15,997 Message.java:605 - 
>>> Unexpected exception during request; channel = [id: 0x332c09b7, /
>>> 10.253.106.210:9042]
>>> java.io.IOException: Error while read(...): Connection reset by peer
>>>
>>> at io.netty.channel.epoll.Native.readAddress(Native Method) 
>>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>>
>>> at 
>>> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
>>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>>
>>> at 
>>> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714)
>>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>>
>>> at 
>>> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 
>>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>>
>>> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 
>>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>>
>>> at 
>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>>
>>> at 
>>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
>>>
>>> Cheer,
>>> -Simon
>>>
>>>
>>>

RE: Adding nodes and cleanup

2017-06-19 Thread ZAIDI, ASAD A

I think the token ranges that are clean/completed and potentially streamed down 
to additional node ,  won’t be cleaned again  so potentially you’ll need to run 
cleanup once again.
Can you can stop cleanup, add additional node and start cleanup over again so 
to get nodes clean in single shot!

From: Mark Furlong [mailto:mfurl...@ancestry.com]
Sent: Monday, June 19, 2017 2:28 PM
To: user@cassandra.apache.org
Subject: Adding nodes and cleanup

I have added a few nodes and now am running some cleanups. Can I add an 
additional node while these cleanups are running? What are the ramifications of 
doing this?

Mark Furlong

Sr. Database Administrator

mfurl...@ancestry.com
M: 801-859-7427
O: 801-705-7115
1300 W Traverse Pkwy
Lehi, UT 84043

[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]

Re: Don't print Ping caused error logs

2017-06-19 Thread Jonathan Haddad

The driver grabs all the cluster information from the nodes you provide the
driver and connects automatically to the rest.  You don't need (and
shouldn't use) a load balancer.

Jon

On Mon, Jun 19, 2017 at 12:28 PM Daniel Hölbling-Inzko <
daniel.hoelbling-in...@bitmovin.com> wrote:

> Just out of curiosity how to you then make sure all nodes get the same
> amount of traffic from clients without having to maintain a manual contact
> points list of all cassandra nodes in the client applications?
> Especially with big C* deployments this sounds like a lot of work whenever
> adding/removing nodes. Putting them behind a lb that can Auto discover
> nodes (or manually adding them to the LB rotation etc) sounds like a much
> easier way.
> I am thinking mostly about cloud lb systems like AWS ELB or GCP LB
>
> Or can the client libraries discover nodes and use other contact points
> für subsequent requests? Having a bunch of seed nodes would be easier I
> guess.
>
> Greetings Daniel
> Akhil Mehra  schrieb am Mo. 19. Juni 2017 um 11:44:
>
>> Just in case you are not aware using a load balancer is an anti patter.
>> Please refer to (
>> http://docs.datastax.com/en/landing_page/doc/landing_page/planning/planningAntiPatterns.html#planningAntiPatterns__AntiPatLoadBal
>> )
>>
>> You can turnoff logging for a particular class using the nodetool
>> setlogginglevel (
>> http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSetLogLev.html
>> ).
>>
>> In your case you can try setting the log level for
>> org.apache.cassandra.transport.Message to warn using the following command
>>
>> nodetool setlogginglevel org.apache.cassandra.transport.Message WARN
>>
>> Obviously this will suppress all info level logging in the message class.
>>
>> I hope that helps.
>>
>> Cheers,
>> Akhil
>>
>>
>>
>>
>> On 19/06/2017, at 9:13 PM, wxn...@zjqunshuo.com wrote:
>>
>> Hi,
>> Our cluster nodes are behind a SLB(Service Load Balancer) with a VIP and
>> the Cassandra client access the cluster by the VIP.
>> System.log print the below IOException every several seconds. I guess
>> it's the SLB service which Ping the port 9042 of the Cassandra node
>> periodically and caused the exceptions print.
>> Any method to prevend the Ping caused exceptions been print?
>>
>>
>> INFO  [SharedPool-Worker-1] 2017-06-19 16:54:15,997 Message.java:605 - 
>> Unexpected exception during request; channel = [id: 0x332c09b7, /
>> 10.253.106.210:9042]
>> java.io.IOException: Error while read(...): Connection reset by peer
>>
>> at io.netty.channel.epoll.Native.readAddress(Native Method) 
>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>
>> at 
>> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>
>> at 
>> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714)
>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>
>> at 
>> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 
>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>
>> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 
>> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>
>> at 
>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>>
>> at 
>> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
>>
>> Cheer,
>> -Simon
>>
>>
>>

Re: Don't print Ping caused error logs

2017-06-19 Thread Daniel Hölbling-Inzko

Just out of curiosity how to you then make sure all nodes get the same
amount of traffic from clients without having to maintain a manual contact
points list of all cassandra nodes in the client applications?
Especially with big C* deployments this sounds like a lot of work whenever
adding/removing nodes. Putting them behind a lb that can Auto discover
nodes (or manually adding them to the LB rotation etc) sounds like a much
easier way.
I am thinking mostly about cloud lb systems like AWS ELB or GCP LB

Or can the client libraries discover nodes and use other contact points für
subsequent requests? Having a bunch of seed nodes would be easier I guess.

Greetings Daniel
Akhil Mehra  schrieb am Mo. 19. Juni 2017 um 11:44:

> Just in case you are not aware using a load balancer is an anti patter.
> Please refer to (
> http://docs.datastax.com/en/landing_page/doc/landing_page/planning/planningAntiPatterns.html#planningAntiPatterns__AntiPatLoadBal
> )
>
> You can turnoff logging for a particular class using the nodetool
> setlogginglevel (
> http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSetLogLev.html
> ).
>
> In your case you can try setting the log level for
> org.apache.cassandra.transport.Message to warn using the following command
>
> nodetool setlogginglevel org.apache.cassandra.transport.Message WARN
>
> Obviously this will suppress all info level logging in the message class.
>
> I hope that helps.
>
> Cheers,
> Akhil
>
>
>
>
> On 19/06/2017, at 9:13 PM, wxn...@zjqunshuo.com wrote:
>
> Hi,
> Our cluster nodes are behind a SLB(Service Load Balancer) with a VIP and
> the Cassandra client access the cluster by the VIP.
> System.log print the below IOException every several seconds. I guess it's
> the SLB service which Ping the port 9042 of the Cassandra node periodically
> and caused the exceptions print.
> Any method to prevend the Ping caused exceptions been print?
>
>
> INFO  [SharedPool-Worker-1] 2017-06-19 16:54:15,997 Message.java:605 - 
> Unexpected exception during request; channel = [id: 0x332c09b7, /
> 10.253.106.210:9042]
> java.io.IOException: Error while read(...): Connection reset by peer
>
> at io.netty.channel.epoll.Native.readAddress(Native Method) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>
> at 
> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>
> at 
> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>
> at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
>
> Cheer,
> -Simon
>
>
>

Adding nodes and cleanup

2017-06-19 Thread Mark Furlong

I have added a few nodes and now am running some cleanups. Can I add an 
additional node while these cleanups are running? What are the ramifications of 
doing this?

Mark Furlong

Sr. Database Administrator

mfurl...@ancestry.com
M: 801-859-7427
O: 801-705-7115
1300 W Traverse Pkwy
Lehi, UT 84043





[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]

Re: SASI index on datetime column does not filter on minutes

2017-06-19 Thread Tobias Eriksson

Thanx guys, it was the timezone thingi …
Adding + did the trick

select lastname,firstname,dateofbirth from playground.individual where 
dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18 
17:55:17+';

-Tobias



From: DuyHai Doan 
Date: Monday, 19 June 2017 at 17:44
To: Hannu Kröger 
Cc: "user@cassandra.apache.org" , Tobias Eriksson 

Subject: Re: SASI index on datetime column does not filter on minutes

The + in the date format is necessary to specify timezone

On Mon, Jun 19, 2017 at 5:38 PM, Hannu Kröger 
> wrote:
Hello,

I tried the same thing with 3.10 which I happened to have at hand and that 
seems to work.

cqlsh:test> select lastname,firstname,dateofbirth from individuals where 
dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18 17:59:18';

 lastname | firstname | dateofbirth
--+---+-
  Jimmie2 |Lundin | 2000-12-19 17:55:17.00+
  Jimmie3 |Lundin | 2000-11-18 17:55:18.00+
   Jimmie |Lundin | 2000-11-18 17:55:17.00+

(3 rows)
cqlsh:test> select lastname,firstname,dateofbirth from individuals where 
dateofbirth < '2001-01-01T10:00:00+' and dateofbirth > 
'2000-11-18T17:59:18+';

 lastname | firstname | dateofbirth
--+---+-
  Jimmie2 |Lundin | 2000-12-19 17:55:17.00+

(1 rows)
cqlsh:test>

Maybe you have timezone issue?

Best Regards,
Hannu

On 19 June 2017 at 17:09:10, Tobias Eriksson 
(tobias.eriks...@qvantel.com) wrote:
Hi
I have a table like this (Cassandra 3.5)
Table
id uuid,
lastname text,
firstname text,
address_id uuid,
dateofbirth timestamp,

PRIMARY KEY (id, lastname, firstname)

And a SASI index like this
create custom index indv_birth ON playground.individual(dateofbirth) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'SPARSE'};

The data

lastname | firstname | dateofbirth
--+---+-
   Lundin |Jimmie | 2000-11-18 17:55:17.00+
  Jansson |   Karolin | 2000-12-19 17:55:17.00+
Öberg |Louisa | 2000-11-18 17:55:18.00+


Now if I do this
select lastname,firstname,dateofbirth from playground.individual where 
dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18 17:59:18';

I should only get ONE row, right
lastname | firstname | dateofbirth
--+---+-
Jansson |   Karolin | 2000-12-19 17:55:17.00+


But instead I get all 3 rows !!!

Why is that ?

-Tobias

RE: Partition range incremental repairs

2017-06-19 Thread ZAIDI, ASAD A

Few options that you can consider to improve repair time are:

§  Un-throttle streamthroughput & interdcstreamthroughput , at least for the 
duration of repair.

§  Increase number of job threads i.e. to use –j option

§  Use subrange repair options

§  implement jumbo frames on your internode- communication network cards on 
your C* host machines.

§  If possible, reduce # of vnodes!

From: Chris Stokesmore [mailto:chris.elsm...@demandlogic.co]
Sent: Monday, June 19, 2017 4:50 AM
To: anujw_2...@yahoo.co.in
Cc: user@cassandra.apache.org
Subject: Re: Partition range incremental repairs

Anyone have anymore thoughts on this at all? Struggling to understand it..

On 9 Jun 2017, at 11:32, Chris Stokesmore 
> wrote:

Hi Anuj,

Thanks for the reply.

1). We are using Cassandra 2.2.8, and our repair commands we are comparing are
"nodetool repair --in-local-dc --partitioner-range” and
"nodetool repair --in-local-dc”
Since 2.2 I believe inc repairs are the default - that seems to be confirmed in 
the logs that list the repair details when a repair starts.

2) From looks at a few runsr, on average:
with -pr repairs, each node is approx 6.5 - 8 hours, so a total over the 7 
nodes of 53 hours
With just inc repairs, each node ~26 - 29 hours, so a total of 193

3) we currently have two DCs in total, the ‘production’ ring with 7 nodes and 
RF=3, and a testing ring with one single node and RF=1 for our single keyspace 
we currently use.

4) Yeah that number came from the Cassandra repair logs from an inc repair, I 
can share the number reports when using a pr repair later this evening when the 
currently running repair has completed.

Many thanks for the reply again,

Chris

On 6 Jun 2017, at 17:50, Anuj Wadehra 
> wrote:

Hi Chris,

Can your share following info:

1. Exact repair commands you use for inc repair and pr repair

2. Repair time should be measured at cluster level for inc repair. So, whats 
the total time it takes to run repair on all nodes for incremental vs pr 
repairs?

3. You are repairing one dc DC3. How many DCs are there in total and whats the 
RF for keyspaces? Running pr on a specific dc would not repair entire data.

4. 885 ranges? From where did you get this number? Logs? Can you share the 
number ranges printed in logs for both inc and pr case?

Thanks
Anuj

Sent from Yahoo Mail on 
Android

On Tue, Jun 6, 2017 at 9:33 PM, Chris Stokesmore
> wrote:
Thank you for the excellent and clear description of the different versions of 
repair Anuj, that has cleared up what I expect to be happening.

The problem now is in our cluster, we are running repairs with options 
(parallelism: parallel, primary range: false, incremental: true, job threads: 
1, ColumnFamilies: [], dataCenters: [DC3], hosts: [], # of ranges: 885) and 
when we do our repairs are taking over a day to complete when previously when 
running with the partition range option they were taking more like 8-9 hours.

As I understand it, using incremental should have sped this process up as all 
three sets of data on each repair job should be marked as repaired however this 
does not seem to be the case. Any ideas?

Chris

On 6 Jun 2017, at 16:08, Anuj Wadehra 
> wrote:

Hi Chris,

Using pr with incremental repairs does not make sense. Primary range repair is 
an optimization over full repair. If you run full repair on a n node cluster 
with RF=3, you would be repairing each data thrice.
E.g. in a 5 node cluster with RF=3, a range may exist on node A,B and C . When 
full repair is run on node A, the entire data in that range gets synced with 
replicas on node B and C. Now, when you run full repair on nodes B and C, you 
are wasting resources on repairing data which is already repaired.

Primary range repair ensures that when you run repair on a node, it ONLY 
repairs the data which is owned by the node. Thus, no node repairs data which 
is not owned by it and must be repaired by other node. Redundant work is 
eliminated.

Even in pr, each time you run pr on all nodes, you repair 100% of data. Why to 
repair complete data in each cycle?? ..even data which has not even changed 
since the last repair cycle?

This is where Incremental repair comes as an improvement. Once repaired, a data 
would be marked repaired so that the next repair cycle could just focus on 
repairing the delta. Now, lets go back to the example of 5 node cluster with RF 
=3.This time we run incremental repair on all nodes. When you repair entire 
data on node A, all 3 replicas are

RE: Secondary Index

2017-06-19 Thread ZAIDI, ASAD A

If you’re only creating index so that your query work, think again!  You’ll be 
storing secondary index on each node , queries involving index could create 
issues (slowness!!) down the road the when index on multiple node Is involved 
and  not maintained!  Tables involving a lot of inserts/delete could easily 
ruin index performance.

You can get away the potential situation by leveraging composite key, if that 
is possible for you?


From: techpyaasa . [mailto:techpya...@gmail.com]
Sent: Monday, June 19, 2017 1:01 PM
To: user@cassandra.apache.org
Subject: Secondary Index

Hi,

I want to create Index on already existing table which has more than 3 GB/node.
We are using c*-2.1.17 with 2 DCs , each DC with 3 groups and each group has 7 
nodes.(Total 42 nodes in cluster)

So is it ok to create Index on this table now or will it have any problem?
If its ok , how much time it would take for this process?


Thanks in advance,
TechPyaasa

Secondary Index

2017-06-19 Thread techpyaasa .

Hi,

I want to create Index on already existing table which has more than 3
GB/node.
We are using c*-2.1.17 with 2 DCs , each DC with 3 groups and each group
has 7 nodes.(Total 42 nodes in cluster)

So is it ok to create Index on this table now or will it have any problem?
If its ok , how much time it would take for this process?


Thanks in advance,
TechPyaasa

Re: SASI index on datetime column does not filter on minutes

2017-06-19 Thread DuyHai Doan

The + in the date format is necessary to specify timezone

On Mon, Jun 19, 2017 at 5:38 PM, Hannu Kröger  wrote:

> Hello,
>
> I tried the same thing with 3.10 which I happened to have at hand and that
> seems to work.
>
> cqlsh:test> select lastname,firstname,dateofbirth from individuals where
> dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18 17:59:18';
>
>  lastname | firstname | dateofbirth
> --+---+-
>   Jimmie2 |Lundin | 2000-12-19 17:55:17.00+
>   Jimmie3 |Lundin | 2000-11-18 17:55:18.00+
>Jimmie |Lundin | 2000-11-18 17:55:17.00+
>
> (3 rows)
> cqlsh:test> select lastname,firstname,dateofbirth from individuals where
> dateofbirth < '2001-01-01T10:00:00+' and dateofbirth >
> '2000-11-18T17:59:18+';
>
>  lastname | firstname | dateofbirth
> --+---+-
>   Jimmie2 |Lundin | 2000-12-19 17:55:17.00+
>
> (1 rows)
> cqlsh:test>
>
> Maybe you have timezone issue?
>
> Best Regards,
> Hannu
>
> On 19 June 2017 at 17:09:10, Tobias Eriksson (tobias.eriks...@qvantel.com)
> wrote:
>
> Hi
>
> I have a table like this (Cassandra 3.5)
>
> Table
>
> id uuid,
>
> lastname text,
>
> firstname text,
>
> address_id uuid,
>
> dateofbirth timestamp,
>
>
>
> PRIMARY KEY (id, lastname, firstname)
>
>
>
> And a SASI index like this
>
> create custom index indv_birth ON playground.individual(dateofbirth)
> USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
> 'SPARSE'};
>
>
>
> The data
>
>
>
> lastname | firstname | dateofbirth
>
> --+---+-
>
>Lundin |Jimmie | 2000-11-18 17:55:17.00+
>
>   Jansson |   Karolin | 2000-12-19 17:55:17.00+
>
> Öberg |Louisa | 2000-11-18 17:55:18.00+
>
>
>
>
>
> Now if I do this
>
> select lastname,firstname,dateofbirth from playground.individual where
> dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18
> 17:59:18';
>
>
>
> I should only get ONE row, right
>
> lastname | firstname | dateofbirth
>
> --+---+-
>
> Jansson |   Karolin | 2000-12-19 17:55:17.00+
>
>
>
>
>
> But instead I get all 3 rows !!!
>
>
>
> Why is that ?
>
>
>
> -Tobias
>
>
>
>
>
>

Re: SASI index on datetime column does not filter on minutes

2017-06-19 Thread Hannu Kröger

Hello,

I tried the same thing with 3.10 which I happened to have at hand and that
seems to work.

cqlsh:test> select lastname,firstname,dateofbirth from individuals where
dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18 17:59:18';

 lastname | firstname | dateofbirth
--+---+-
  Jimmie2 |Lundin | 2000-12-19 17:55:17.00+
  Jimmie3 |Lundin | 2000-11-18 17:55:18.00+
   Jimmie |Lundin | 2000-11-18 17:55:17.00+

(3 rows)
cqlsh:test> select lastname,firstname,dateofbirth from individuals where
dateofbirth < '2001-01-01T10:00:00+' and dateofbirth >
'2000-11-18T17:59:18+';

 lastname | firstname | dateofbirth
--+---+-
  Jimmie2 |Lundin | 2000-12-19 17:55:17.00+

(1 rows)
cqlsh:test>

Maybe you have timezone issue?

Best Regards,
Hannu

On 19 June 2017 at 17:09:10, Tobias Eriksson (tobias.eriks...@qvantel.com)
wrote:

Hi

I have a table like this (Cassandra 3.5)

Table

id uuid,

lastname text,

firstname text,

address_id uuid,

dateofbirth timestamp,



PRIMARY KEY (id, lastname, firstname)



And a SASI index like this

create custom index indv_birth ON playground.individual(dateofbirth) USING
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
'SPARSE'};



The data



lastname | firstname | dateofbirth

--+---+-

   Lundin |Jimmie | 2000-11-18 17:55:17.00+

  Jansson |   Karolin | 2000-12-19 17:55:17.00+

Öberg |Louisa | 2000-11-18 17:55:18.00+





Now if I do this

select lastname,firstname,dateofbirth from playground.individual where
dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18 17:59:18';



I should only get ONE row, right

lastname | firstname | dateofbirth

--+---+-

Jansson |   Karolin | 2000-12-19 17:55:17.00+





But instead I get all 3 rows !!!



Why is that ?



-Tobias

SASI index on datetime column does not filter on minutes

2017-06-19 Thread Tobias Eriksson

Hi
I have a table like this (Cassandra 3.5)
Table
id uuid,
lastname text,
firstname text,
address_id uuid,
dateofbirth timestamp,

PRIMARY KEY (id, lastname, firstname)

And a SASI index like this
create custom index indv_birth ON playground.individual(dateofbirth) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'SPARSE'};

The data

lastname | firstname | dateofbirth
--+---+-
   Lundin |Jimmie | 2000-11-18 17:55:17.00+
  Jansson |   Karolin | 2000-12-19 17:55:17.00+
Öberg |Louisa | 2000-11-18 17:55:18.00+


Now if I do this
select lastname,firstname,dateofbirth from playground.individual where 
dateofbirth < '2001-01-01T10:00:00' and dateofbirth > '2000-11-18 17:59:18';

I should only get ONE row, right
lastname | firstname | dateofbirth
--+---+-
Jansson |   Karolin | 2000-12-19 17:55:17.00+


But instead I get all 3 rows !!!

Why is that ?

-Tobias

CQL: fails to COPY FROM with null values

2017-06-19 Thread Tobias Eriksson

Hi
 I am trying to copy a file of CSV data into a table
But I get an error since sometimes one of the columns (which is a UUID) is empty
Is this a bug or what am I missing?

Here is how it looks like
Table
id uuid,
lastname text,
firstname text,
address_id uuid,
dateofbirth timestamp,

PRIMARY KEY (id, lastname, firstname)

COPY playground.individual(id,lastname,firstname,address_id) FROM 
‘testfile.csv’;

Where the testfile.csv is like this

This works !!!
ce98d62a-3666-4d3a-ae2f-df315ad448aa,Jonsson,Malcom 
,c9dc8b60-d27f-430c-b960-782d854df3a5,2001-01-19 17:55:17+

This does NOT work !!!
ce98d62a-3666-4d3a-ae2f-df315ad448aa,Jonsson,Malcom , ,2001-01-19 17:55:17+

Cause then I get the following error
Failed to import 1 rows: ParseError - badly formed hexadecimal UUID string,  
given up without retries

So, how do I import my CSV file and set the columns which does not have a UUID 
to null ?

-Tobias

Re: Question: Behavior of inserting a list multiple times with same timestamp

2017-06-19 Thread Thakrar, Jayesh

Subroto,

Cassandra docs say otherwise.

Writing list data is accomplished with a JSON-style syntax. To write a record 
using INSERT, specify the entire list as a JSON array. Note: An INSERT will 
always replace the entire list.

Maybe you can elaborate/shed some more light?

Thanks,
Jayesh

Lists

A list is a typed collection of non-unique values where elements are ordered by 
there position in the list. To create a column of type list, use the list 
keyword suffixed with the value type enclosed in angle brackets. For example:

CREATE TABLE plays (
id text PRIMARY KEY,
game text,
players int,
scores list
)
Do note that as explained below, lists have some limitations and performance 
considerations to take into account, and it is advised to prefer sets over 
lists when this is possible.

Writing list data is accomplished with a JSON-style syntax. To write a record 
using INSERT, specify the entire list as a JSON array. Note: An INSERT will 
always replace the entire list.

INSERT INTO plays (id, game, players, scores)
   VALUES ('123-afde', 'quake', 3, [17, 4, 2]);
Adding (appending or prepending) values to a list can be accomplished by adding 
a new JSON-style array to an existing list column.

UPDATE plays SET players = 5, scores = scores + [ 14, 21 ] WHERE id = 
'123-afde';
UPDATE plays SET players = 5, scores = [ 12 ] + scores WHERE id = '123-afde';
It should be noted that append and prepend are not idempotent operations. This 
means that if during an append or a prepend the operation timeout, it is not 
always safe to retry the operation (as this could result in the record appended 
or prepended twice).

Lists also provides the following operation: setting an element by its position 
in the list, removing an element by its position in the list and remove all the 
occurrence of a given value in the list. However, and contrarily to all the 
other collection operations, these three operations induce an internal read 
before the update, and will thus typically have slower performance 
characteristics. Those operations have the following syntax:

UPDATE plays SET scores[1] = 7 WHERE id = '123-afde';// sets 
the 2nd element of scores to 7 (raises an error is scores has less than 2 
elements)
DELETE scores[1] FROM plays WHERE id = '123-afde';   // deletes 
the 2nd element of scores (raises an error is scores has less than 2 elements)
UPDATE plays SET scores = scores - [ 12, 21 ] WHERE id = '123-afde'; // removes 
all occurrences of 12 and 21 from scores
As with maps, TTLs if used only apply to the newly inserted/updated values.

On 6/19/17, 1:12 AM, "Subroto Barua"  wrote:

This is an expected behavior.

We learned this issue/feature at the current site (we use Dse 5.08)

Subroto 

> On Jun 18, 2017, at 10:29 PM, Zhongxiang Zheng  
wrote:
> 
> Hi all,
> 
> I have a question about a behavior when insert a list with specifying 
timestamp.
> 
> It is documented that "An INSERT will always replace the entire list."
> https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile#lists
> 
> However, When a list is inserted multiple times using same timestamp,
> it will not be replaced, but will be added as follows.
> 
> cqlsh> CREATE TABLE test.test (k int PRIMARY KEY , v list);  

> cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) USING TIMESTAMP 
1000 ;  

> cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) USING TIMESTAMP 
1000 ;
> cqlsh> SELECT * FROM test.test ;
> 
> k | v
> ---+
> 1 | [1, 1]
> 
> I confirmed this behavior is reproduced in 3.0.13 and 3.10.
> I'd like to ask whether this behavior is a expected behavior or a bug?
> 
> In our use case, CQL statements with same values and timestamp will be 
issued multiple times
> to retry inserting under the assumption that insert is idempotent.
> So, I expect that the entire list will be replace even if insert a list 
multiple times with same timestamp.
> 
> Thanks,
> 
> Zhongxiang
> 
> 
> 
Ð¢ÐÐ¥Fò
Vç7V'67&–ÂRÖÖ–Ã¢
W6W"×Vç7V'67&–676æG&æ6†Ræ÷Ð¤f÷"FF—F–öæÂ6öÖÖæG2ÂRÖÖ–Ã¢
W6W"Ö†VÇ676æG&æ6†Ræ÷Ð

Re: Partition range incremental repairs

2017-06-19 Thread Chris Stokesmore

Anyone have anymore thoughts on this at all? Struggling to understand it..


> On 9 Jun 2017, at 11:32, Chris Stokesmore  
> wrote:
> 
> Hi Anuj,
> 
> Thanks for the reply.
> 
> 1). We are using Cassandra 2.2.8, and our repair commands we are comparing 
> are 
> "nodetool repair --in-local-dc --partitioner-range” and 
> "nodetool repair --in-local-dc”
> Since 2.2 I believe inc repairs are the default - that seems to be confirmed 
> in the logs that list the repair details when a repair starts.
> 
> 2) From looks at a few runsr, on average:
> with -pr repairs, each node is approx 6.5 - 8 hours, so a total over the 7 
> nodes of 53 hours
> With just inc repairs, each node ~26 - 29 hours, so a total of 193
> 
> 3) we currently have two DCs in total, the ‘production’ ring with 7 nodes and 
> RF=3, and a testing ring with one single node and RF=1 for our single 
> keyspace we currently use.
> 
> 4) Yeah that number came from the Cassandra repair logs from an inc repair, I 
> can share the number reports when using a pr repair later this evening when 
> the currently running repair has completed.
> 
> 
> Many thanks for the reply again,
> 
> Chris
> 
> 
>> On 6 Jun 2017, at 17:50, Anuj Wadehra > > wrote:
>> 
>> Hi Chris,
>> 
>> Can your share following info:
>> 
>> 1. Exact repair commands you use for inc repair and pr repair
>> 
>> 2. Repair time should be measured at cluster level for inc repair. So, whats 
>> the total time it takes to run repair on all nodes for incremental vs pr 
>> repairs?
>> 
>> 3. You are repairing one dc DC3. How many DCs are there in total and whats 
>> the RF for keyspaces? Running pr on a specific dc would not repair entire 
>> data.
>> 
>> 4. 885 ranges? From where did you get this number? Logs? Can you share the 
>> number ranges printed in logs for both inc and pr case?
>> 
>> 
>> Thanks
>> Anuj
>> 
>> 
>> Sent from Yahoo Mail on Android 
>> 
>> On Tue, Jun 6, 2017 at 9:33 PM, Chris Stokesmore
>> > wrote:
>> Thank you for the excellent and clear description of the different versions 
>> of repair Anuj, that has cleared up what I expect to be happening.
>> 
>> The problem now is in our cluster, we are running repairs with options 
>> (parallelism: parallel, primary range: false, incremental: true, job 
>> threads: 1, ColumnFamilies: [], dataCenters: [DC3], hosts: [], # of ranges: 
>> 885) and when we do our repairs are taking over a day to complete when 
>> previously when running with the partition range option they were taking 
>> more like 8-9 hours.
>> 
>> As I understand it, using incremental should have sped this process up as 
>> all three sets of data on each repair job should be marked as repaired 
>> however this does not seem to be the case. Any ideas?
>> 
>> Chris
>> 
>>> On 6 Jun 2017, at 16:08, Anuj Wadehra >> > wrote:
>>> 
>>> Hi Chris,
>>> 
>>> Using pr with incremental repairs does not make sense. Primary range repair 
>>> is an optimization over full repair. If you run full repair on a n node 
>>> cluster with RF=3, you would be repairing each data thrice. 
>>> E.g. in a 5 node cluster with RF=3, a range may exist on node A,B and C . 
>>> When full repair is run on node A, the entire data in that range gets 
>>> synced with replicas on node B and C. Now, when you run full repair on 
>>> nodes B and C, you are wasting resources on repairing data which is already 
>>> repaired. 
>>> 
>>> Primary range repair ensures that when you run repair on a node, it ONLY 
>>> repairs the data which is owned by the node. Thus, no node repairs data 
>>> which is not owned by it and must be repaired by other node. Redundant work 
>>> is eliminated. 
>>> 
>>> Even in pr, each time you run pr on all nodes, you repair 100% of data. Why 
>>> to repair complete data in each cycle?? ..even data which has not even 
>>> changed since the last repair cycle?
>>> 
>>> This is where Incremental repair comes as an improvement. Once repaired, a 
>>> data would be marked repaired so that the next repair cycle could just 
>>> focus on repairing the delta. Now, lets go back to the example of 5 node 
>>> cluster with RF =3.This time we run incremental repair on all nodes. When 
>>> you repair entire data on node A, all 3 replicas are marked as repaired. 
>>> Even if you run inc repair on all ranges on the second node, you would not 
>>> re-repair the already repaired data. Thus, there is no advantage of 
>>> repairing only the data owned by the node (primary range of the node). You 
>>> can run inc repair on all the data present on a node and Cassandra would 
>>> make sure that when you repair data on other nodes, you only repair 
>>> unrepaired data.
>>> 
>>> Thanks
>>> Anuj
>>> 
>>> 
>>> 
>>> Sent from Yahoo Mail on

Re: Don't print Ping caused error logs

2017-06-19 Thread Akhil Mehra

Just in case you are not aware using a load balancer is an anti patter. Please 
refer to 
(http://docs.datastax.com/en/landing_page/doc/landing_page/planning/planningAntiPatterns.html#planningAntiPatterns__AntiPatLoadBal
 
)

You can turnoff logging for a particular class using the nodetool 
setlogginglevel 
(http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSetLogLev.html 
).

In your case you can try setting the log level for 
org.apache.cassandra.transport.Message to warn using the following command

nodetool setlogginglevel org.apache.cassandra.transport.Message WARN

Obviously this will suppress all info level logging in the message class. 

I hope that helps.

Cheers,
Akhil




> On 19/06/2017, at 9:13 PM, wxn...@zjqunshuo.com wrote:
> 
> Hi,
> Our cluster nodes are behind a SLB(Service Load Balancer) with a VIP and the 
> Cassandra client access the cluster by the VIP. 
> System.log print the below IOException every several seconds. I guess it's 
> the SLB service which Ping the port 9042 of the Cassandra node periodically 
> and caused the exceptions print.
> Any method to prevend the Ping caused exceptions been print?
> 
> INFO  [SharedPool-Worker-1] 2017-06-19 16:54:15,997 Message.java:605 - 
> Unexpected exception during request; channel = [id: 0x332c09b7, 
> /10.253.106.210:9042]
> java.io.IOException: Error while read(...): Connection reset by peer
>   at io.netty.channel.epoll.Native.readAddress(Native Method) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
> 
> Cheer,
> -Simon

Re: Cleaning up related issue

2017-06-19 Thread Akhil Mehra

The nodetool cleanup docs explain this increase in disk space usage.

"Running the nodetool cleanupcommand causes a temporary increase in disk space 
usage proportional to the size of your largest SSTable. Disk I/O occurs when 
running this command."

http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCleanup.html 


Cheers,
Akhil


> On 19/06/2017, at 7:47 PM, wxn...@zjqunshuo.com wrote:
> 
> Akhil, I agree with you that the node still has unwanted data, but why it has 
> more data than before cleaning up?
> 
> More background:
> Before cleaning up, the node has 790GB data. After cleaning up, I assume it 
> should has less data. But in fact it has 1000GB data which is larger than I 
> expected.
> Cassandra daemon crashed and left the files with the name with "tmp-" prefix 
> in the data directory which indicate the cleaning up task was not complete.
> 
> Cheers,
> -Simon
>  
> From: Akhil Mehra 
> Date: 2017-06-19 15:17
> To: wxn...@zjqunshuo.com 
> CC: user 
> Subject: Re: Cleaning up related issue
> When you add a new node into the cluster data is streamed for all the old 
> nodes into the new node added. The new node is now responsible for data 
> previously stored in the old node.
>  
> The clean up process removes unwanted data after adding a new node to the 
> cluster.
>  
> In your case clean up failed on this node.
>  
> I think this node still has unwanted data that has not been cleaned up.
>  
> Cheers,
> Akhil
>  
>  
>  
>  
> > On 19/06/2017, at 7:00 PM, wxn...@zjqunshuo.com wrote:
> >
> > Thanks for the quick response. It's the existing node where the cleanup 
> > failed. It has a larger volume than other nodes.
> >  
> > From: Akhil Mehra
> > Date: 2017-06-19 14:56
> > To: wxn002
> > CC: user
> > Subject: Re: Cleaning up related issue
> > Is the node with the large volume a new node or an existing node. If it is 
> > an existing node is this the one where the node tool cleanup failed.
> >
> > Cheers,
> > Akhil
> >
> >> On 19/06/2017, at 6:40 PM, wxn...@zjqunshuo.com wrote:
> >>
> >> Hi,
> >> After adding a new node, I started cleaning up task to remove the old data 
> >> on the other 4 nodes. All went well except one node. The cleanup takes 
> >> hours and the Cassandra daemon crashed in the third node. I checked the 
> >> node and found the crash was because of OOM. The Cassandra data volume has 
> >> zero space left. I removed the temporary files which I believe created 
> >> during the cleaning up process and started Cassanndra.
> >>
> >> The node joined the cluster successfully, but one thing I found. From the 
> >> "nodetool status" output, the node takes much data than other nodes. 
> >> Nomally the load should be 700GB. But actually it's 1000GB. Why it is 
> >> larger? Please see the output below.
> >>
> >> UN  10.253.44.149   705.98 GB  256  40.4% 
> >> 9180b7c9-fa0b-4bbe-bf62-64a599c01e58  rack1
> >> UN  10.253.106.218  691.07 GB  256  39.9% 
> >> e24d13e2-96cb-4e8c-9d94-22498ad67c85  rack1
> >> UN  10.253.42.113   623.73 GB  256  39.3% 
> >> 385ad28c-0f3f-415f-9e0a-7fe8bef97e17  rack1
> >> UN  10.253.41.165   779.38 GB  256  40.1% 
> >> 46f37f06-9c45-492d-bd25-6fef7f926e38  rack1
> >> UN  10.253.106.210  1022.7 GB  256  40.3% 
> >> a31b6088-0cb2-40b4-ac22-aec718dbd035  rack1
> >>
> >> Cheers,
> >> -Simon

Don't print Ping caused error logs

2017-06-19 Thread wxn...@zjqunshuo.com

Hi,
Our cluster nodes are behind a SLB(Service Load Balancer) with a VIP and the 
Cassandra client access the cluster by the VIP. 
System.log print the below IOException every several seconds. I guess it's the 
SLB service which Ping the port 9042 of the Cassandra node periodically and 
caused the exceptions print.
Any method to prevend the Ping caused exceptions been print?

INFO  [SharedPool-Worker-1] 2017-06-19 16:54:15,997 Message.java:605 - 
Unexpected exception during request; channel = [id: 0x332c09b7, 
/10.253.106.210:9042]
java.io.IOException: Error while read(...): Connection reset by peer
at io.netty.channel.epoll.Native.readAddress(Native Method) 
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]

Cheer,
-Simon

Re: Re: Cleaning up related issue

2017-06-19 Thread wxn...@zjqunshuo.com

Akhil, I agree with you that the node still has unwanted data, but why it has 
more data than before cleaning up?

More background:
Before cleaning up, the node has 790GB data. After cleaning up, I assume it 
should has less data. But in fact it has 1000GB data which is larger than I 
expected.
Cassandra daemon crashed and left the files with the name with "tmp-" prefix in 
the data directory which indicate the cleaning up task was not complete.

Cheers,
-Simon

From: Akhil Mehra
Date: 2017-06-19 15:17
To: wxn...@zjqunshuo.com
CC: user
Subject: Re: Cleaning up related issue
When you add a new node into the cluster data is streamed for all the old nodes 
into the new node added. The new node is now responsible for data previously 
stored in the old node.

The clean up process removes unwanted data after adding a new node to the 
cluster.

In your case clean up failed on this node. 

I think this node still has unwanted data that has not been cleaned up.

Cheers,
Akhil 

> On 19/06/2017, at 7:00 PM, wxn...@zjqunshuo.com wrote:
> 
> Thanks for the quick response. It's the existing node where the cleanup 
> failed. It has a larger volume than other nodes.
>  
> From: Akhil Mehra
> Date: 2017-06-19 14:56
> To: wxn002
> CC: user
> Subject: Re: Cleaning up related issue
> Is the node with the large volume a new node or an existing node. If it is an 
> existing node is this the one where the node tool cleanup failed.
> 
> Cheers,
> Akhil
> 
>> On 19/06/2017, at 6:40 PM, wxn...@zjqunshuo.com wrote:
>> 
>> Hi,
>> After adding a new node, I started cleaning up task to remove the old data 
>> on the other 4 nodes. All went well except one node. The cleanup takes hours 
>> and the Cassandra daemon crashed in the third node. I checked the node and 
>> found the crash was because of OOM. The Cassandra data volume has zero space 
>> left. I removed the temporary files which I believe created during the 
>> cleaning up process and started Cassanndra. 
>> 
>> The node joined the cluster successfully, but one thing I found. From the 
>> "nodetool status" output, the node takes much data than other nodes. Nomally 
>> the load should be 700GB. But actually it's 1000GB. Why it is larger? Please 
>> see the output below. 
>> 
>> UN  10.253.44.149   705.98 GB  256  40.4% 
>> 9180b7c9-fa0b-4bbe-bf62-64a599c01e58  rack1
>> UN  10.253.106.218  691.07 GB  256  39.9% 
>> e24d13e2-96cb-4e8c-9d94-22498ad67c85  rack1
>> UN  10.253.42.113   623.73 GB  256  39.3% 
>> 385ad28c-0f3f-415f-9e0a-7fe8bef97e17  rack1
>> UN  10.253.41.165   779.38 GB  256  40.1% 
>> 46f37f06-9c45-492d-bd25-6fef7f926e38  rack1
>> UN  10.253.106.210  1022.7 GB  256  40.3% 
>> a31b6088-0cb2-40b4-ac22-aec718dbd035  rack1
>> 
>> Cheers,
>> -Simon

Re: Cleaning up related issue

2017-06-19 Thread Akhil Mehra

When you add a new node into the cluster data is streamed for all the old nodes 
into the new node added. The new node is now responsible for data previously 
stored in the old node.

The clean up process removes unwanted data after adding a new node to the 
cluster.

In your case clean up failed on this node. 

I think this node still has unwanted data that has not been cleaned up.

Cheers,
Akhil 




> On 19/06/2017, at 7:00 PM, wxn...@zjqunshuo.com wrote:
> 
> Thanks for the quick response. It's the existing node where the cleanup 
> failed. It has a larger volume than other nodes.
>  
> From: Akhil Mehra
> Date: 2017-06-19 14:56
> To: wxn002
> CC: user
> Subject: Re: Cleaning up related issue
> Is the node with the large volume a new node or an existing node. If it is an 
> existing node is this the one where the node tool cleanup failed.
> 
> Cheers,
> Akhil
> 
>> On 19/06/2017, at 6:40 PM, wxn...@zjqunshuo.com wrote:
>> 
>> Hi,
>> After adding a new node, I started cleaning up task to remove the old data 
>> on the other 4 nodes. All went well except one node. The cleanup takes hours 
>> and the Cassandra daemon crashed in the third node. I checked the node and 
>> found the crash was because of OOM. The Cassandra data volume has zero space 
>> left. I removed the temporary files which I believe created during the 
>> cleaning up process and started Cassanndra. 
>> 
>> The node joined the cluster successfully, but one thing I found. From the 
>> "nodetool status" output, the node takes much data than other nodes. Nomally 
>> the load should be 700GB. But actually it's 1000GB. Why it is larger? Please 
>> see the output below. 
>> 
>> UN  10.253.44.149   705.98 GB  256  40.4% 
>> 9180b7c9-fa0b-4bbe-bf62-64a599c01e58  rack1
>> UN  10.253.106.218  691.07 GB  256  39.9% 
>> e24d13e2-96cb-4e8c-9d94-22498ad67c85  rack1
>> UN  10.253.42.113   623.73 GB  256  39.3% 
>> 385ad28c-0f3f-415f-9e0a-7fe8bef97e17  rack1
>> UN  10.253.41.165   779.38 GB  256  40.1% 
>> 46f37f06-9c45-492d-bd25-6fef7f926e38  rack1
>> UN  10.253.106.210  1022.7 GB  256  40.3% 
>> a31b6088-0cb2-40b4-ac22-aec718dbd035  rack1
>> 
>> Cheers,
>> -Simon


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Re: Cleaning up related issue

2017-06-19 Thread wxn...@zjqunshuo.com

Thanks for the quick response. It's the existing node where the cleanup failed. 
It has a larger volume than other nodes.

From: Akhil Mehra
Date: 2017-06-19 14:56
To: wxn002
CC: user
Subject: Re: Cleaning up related issue
Is the node with the large volume a new node or an existing node. If it is an 
existing node is this the one where the node tool cleanup failed.

Cheers,
Akhil

On 19/06/2017, at 6:40 PM, wxn...@zjqunshuo.com wrote:

Hi,
After adding a new node, I started cleaning up task to remove the old data on 
the other 4 nodes. All went well except one node. The cleanup takes hours and 
the Cassandra daemon crashed in the third node. I checked the node and found 
the crash was because of OOM. The Cassandra data volume has zero space left. I 
removed the temporary files which I believe created during the cleaning up 
process and started Cassanndra. 

The node joined the cluster successfully, but one thing I found. From the 
"nodetool status" output, the node takes much data than other nodes. Nomally 
the load should be 700GB. But actually it's 1000GB. Why it is larger? Please 
see the output below. 

UN  10.253.44.149   705.98 GB  256  40.4% 
9180b7c9-fa0b-4bbe-bf62-64a599c01e58  rack1
UN  10.253.106.218  691.07 GB  256  39.9% 
e24d13e2-96cb-4e8c-9d94-22498ad67c85  rack1
UN  10.253.42.113   623.73 GB  256  39.3% 
385ad28c-0f3f-415f-9e0a-7fe8bef97e17  rack1
UN  10.253.41.165   779.38 GB  256  40.1% 
46f37f06-9c45-492d-bd25-6fef7f926e38  rack1
UN  10.253.106.210  1022.7 GB  256  40.3% 
a31b6088-0cb2-40b4-ac22-aec718dbd035  rack1

Cheers,
-Simon

Re: Cleaning up related issue

2017-06-19 Thread Akhil Mehra

Is the node with the large volume a new node or an existing node. If it is an 
existing node is this the one where the node tool cleanup failed.

Cheers,
Akhil

> On 19/06/2017, at 6:40 PM, wxn...@zjqunshuo.com wrote:
> 
> Hi,
> After adding a new node, I started cleaning up task to remove the old data on 
> the other 4 nodes. All went well except one node. The cleanup takes hours and 
> the Cassandra daemon crashed in the third node. I checked the node and found 
> the crash was because of OOM. The Cassandra data volume has zero space left. 
> I removed the temporary files which I believe created during the cleaning up 
> process and started Cassanndra. 
> 
> The node joined the cluster successfully, but one thing I found. From the 
> "nodetool status" output, the node takes much data than other nodes. Nomally 
> the load should be 700GB. But actually it's 1000GB. Why it is larger? Please 
> see the output below. 
> 
> UN  10.253.44.149   705.98 GB  256  40.4% 
> 9180b7c9-fa0b-4bbe-bf62-64a599c01e58  rack1
> UN  10.253.106.218  691.07 GB  256  39.9% 
> e24d13e2-96cb-4e8c-9d94-22498ad67c85  rack1
> UN  10.253.42.113   623.73 GB  256  39.3% 
> 385ad28c-0f3f-415f-9e0a-7fe8bef97e17  rack1
> UN  10.253.41.165   779.38 GB  256  40.1% 
> 46f37f06-9c45-492d-bd25-6fef7f926e38  rack1
> UN  10.253.106.210  1022.7 GB  256  40.3% 
> a31b6088-0cb2-40b4-ac22-aec718dbd035  rack1
> 
> Cheers,
> -Simon

Cleaning up related issue

2017-06-19 Thread wxn...@zjqunshuo.com

Hi,
After adding a new node, I started cleaning up task to remove the old data on 
the other 4 nodes. All went well except one node. The cleanup takes hours and 
the Cassandra daemon crashed in the third node. I checked the node and found 
the crash was because of OOM. The Cassandra data volume has zero space left. I 
removed the temporary files which I believe created during the cleaning up 
process and started Cassanndra. 

The node joined the cluster successfully, but one thing I found. From the 
"nodetool status" output, the node takes much data than other nodes. Nomally 
the load should be 700GB. But actually it's 1000GB. Why it is larger? Please 
see the output below. 

UN  10.253.44.149   705.98 GB  256  40.4% 
9180b7c9-fa0b-4bbe-bf62-64a599c01e58  rack1
UN  10.253.106.218  691.07 GB  256  39.9% 
e24d13e2-96cb-4e8c-9d94-22498ad67c85  rack1
UN  10.253.42.113   623.73 GB  256  39.3% 
385ad28c-0f3f-415f-9e0a-7fe8bef97e17  rack1
UN  10.253.41.165   779.38 GB  256  40.1% 
46f37f06-9c45-492d-bd25-6fef7f926e38  rack1
UN  10.253.106.210  1022.7 GB  256  40.3% 
a31b6088-0cb2-40b4-ac22-aec718dbd035  rack1

Cheers,
-Simon

Re: Question: Behavior of inserting a list multiple times with same timestamp

2017-06-19 Thread Subroto Barua

This is an expected behavior.

We learned this issue/feature at the current site (we use Dse 5.08)

Subroto 

> On Jun 18, 2017, at 10:29 PM, Zhongxiang Zheng  wrote:
> 
> Hi all,
> 
> I have a question about a behavior when insert a list with specifying 
> timestamp.
> 
> It is documented that "An INSERT will always replace the entire list."
> https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile#lists
> 
> However, When a list is inserted multiple times using same timestamp,
> it will not be replaced, but will be added as follows.
> 
> cqlsh> CREATE TABLE test.test (k int PRIMARY KEY , v list);  
>   
> 
> cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) USING TIMESTAMP 1000 ; 
>   
> 
> cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) USING TIMESTAMP 1000 ;
> cqlsh> SELECT * FROM test.test ;
> 
> k | v
> ---+
> 1 | [1, 1]
> 
> I confirmed this behavior is reproduced in 3.0.13 and 3.10.
> I'd like to ask whether this behavior is a expected behavior or a bug?
> 
> In our use case, CQL statements with same values and timestamp will be issued 
> multiple times
> to retry inserting under the assumption that insert is idempotent.
> So, I expect that the entire list will be replace even if insert a list 
> multiple times with same timestamp.
> 
> Thanks,
> 
> Zhongxiang
> 
> 
> Ð¢ÐÐ¥FòVç7V'67&–ÂRÖÖ–Ã¢W6W"×Vç7V'67&–676æG&æ6†Ræ÷Ð¤f÷"FF—F–öæÂ6öÖÖæG2ÂRÖÖ–Ã¢W6W"Ö†VÇ676æG&æ6†Ræ÷Ð


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Large temporary files generated during cleaning up

Re: Question: Behavior of inserting a list multiple times with same timestamp

Re: Cassandra is always CP or AP in terms of CAP theorem

Cassandra is always CP or AP in terms of CAP theorem

Re: CQL: fails to COPY FROM with null values

Re: Don't print Ping caused error logs

RE: Adding nodes and cleanup

Re: Don't print Ping caused error logs

Re: Don't print Ping caused error logs

Adding nodes and cleanup

Re: SASI index on datetime column does not filter on minutes

RE: Partition range incremental repairs

RE: Secondary Index

Secondary Index

Re: SASI index on datetime column does not filter on minutes

Re: SASI index on datetime column does not filter on minutes

SASI index on datetime column does not filter on minutes

CQL: fails to COPY FROM with null values

Re: Question: Behavior of inserting a list multiple times with same timestamp

Re: Partition range incremental repairs

Re: Don't print Ping caused error logs

Re: Cleaning up related issue

Don't print Ping caused error logs

Re: Re: Cleaning up related issue

Re: Cleaning up related issue

Re: Re: Cleaning up related issue

Re: Cleaning up related issue

Cleaning up related issue

Re: Question: Behavior of inserting a list multiple times with same timestamp

29 matches

Site Navigation

Mail list logo

Footer information