Re: Rebuilding a node without clients hitting it

2019-08-05 Thread John Sanda
Assuming the rebuild is happening on a node in another DC, then there
should not be an issue if you are using LOCAL_ONE. If the node is in the
local DC (i.e., same DC as the client), I am inclined to think repair would
be more appropriate than rebuild but I am not 100% certain.

On Mon, Aug 5, 2019 at 11:23 PM Jeff Jirsa  wrote:

> No, not strictly sufficient - makes it much less likely though
>
> A client may connect to another node and still send the request to that
> host if the snitch picks it. You can make THAT less likely with some snitch
> trickery (setting the badness for the rebuilding host) via jmx
>
> On Aug 5, 2019, at 8:17 PM, Cyril Scetbon  wrote:
>
> Hey guys,
>
> Can you confirm that disabling the native transport (nodetool
> disablebinary) is enough with Cassandra 3.11+ to avoid clients hitting
> inconsistent data on that node when they use LOCAL_ONE  consistency ?
> (Particularly when the node is rebuilding …)
> I'd like to avoid any fancy client configuration like blacklisting nodes.
>
> Thanks
> —
> Cyril Scetbon
>
>

-- 

- John


Re: Rebuilding a node without clients hitting it

2019-08-05 Thread Jeff Jirsa
No, not strictly sufficient - makes it much less likely though

A client may connect to another node and still send the request to that host if 
the snitch picks it. You can make THAT less likely with some snitch trickery 
(setting the badness for the rebuilding host) via jmx 

> On Aug 5, 2019, at 8:17 PM, Cyril Scetbon  wrote:
> 
> Hey guys,
> 
> Can you confirm that disabling the native transport (nodetool disablebinary) 
> is enough with Cassandra 3.11+ to avoid clients hitting inconsistent data on 
> that node when they use LOCAL_ONE  consistency ? (Particularly when the node 
> is rebuilding …)
> I'd like to avoid any fancy client configuration like blacklisting nodes.
> 
> Thanks 
> —
> Cyril Scetbon
> 


Rebuilding a node without clients hitting it

2019-08-05 Thread Cyril Scetbon
Hey guys,

Can you confirm that disabling the native transport (nodetool disablebinary) is 
enough with Cassandra 3.11+ to avoid clients hitting inconsistent data on that 
node when they use LOCAL_ONE  consistency ? (Particularly when the node is 
rebuilding …)
I'd like to avoid any fancy client configuration like blacklisting nodes.

Thanks 
—
Cyril Scetbon



Re: Cassandra read requests not getting timeout

2019-08-05 Thread Jon Haddad
I think this might be because the timeout only applied to each request, and
the driver is paginating in the background. Each page is a new request.

On Mon, Aug 5, 2019, 12:08 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Mon, Aug 5, 2019 at 8:50 AM nokia ceph 
> wrote:
>
>> Hi Community,
>>
>> I am using Cassanadra 3.0.13 . 5 node cluster simple topology. Following
>> are the timeout  parameters in yaml file:
>>
>> # grep timeout /etc/cassandra/conf/cassandra.yaml
>> cas_contention_timeout_in_ms: 1000
>> counter_write_request_timeout_in_ms: 5000
>> cross_node_timeout: false
>> range_request_timeout_in_ms: 1
>> read_request_timeout_in_ms: 1
>> request_timeout_in_ms: 1
>> truncate_request_timeout_in_ms: 6
>> write_request_timeout_in_ms: 2000
>>
>> i'm trying a cassandra query using cqlsh and it is not getting timeout.
>>
>> #time cqlsh 10.50.11.11 -e "CONSISTENCY QUORUM; select
>> asset_name,profile_name,job_index,active,last_valid_op,last_valid_op_ts,status,status_description,live_depth,asset_type,dest_path,source_docroot_name,source_asset_name,start_time,end_time,iptv,drm,geo,last_gc
>> from cdvr.jobs where model_type ='asset' AND docroot_name='vx030'
>>  LIMIT 10 ALLOW FILTERING;"
>> Consistency level set to QUORUM.
>> ()
>> ()
>> (79024 rows)
>>
>> real16m30.488s
>> user0m39.761s
>> sys 0m3.896s
>>
>> The query took 16.5 minutes  to display the output. But my
>> read_request_timeout is 10 seconds. why the query doesn't got timeout after
>> 10 s ??
>>
>
> Hi Renoy,
>
> Have you tried the same query with enabling TRACING beforehand?
>
> https://docs.datastax.com/en/archived/cql/3.3/cql/cql_reference/cqlshTracing.html
>
> It doesn't sound all too likely that it has taken the client 16 minutes to
> display the resultset, but this is definitely not included in the request
> timeout from the server point of view.
>
> Cheers,
> --
> Alex
>
>


RE: [EXTERNAL] Re: loading big amount of data to Cassandra

2019-08-05 Thread Durity, Sean R
DataStax has a very fast bulk load tool - dsebulk. Not sure if it is available 
for open source or not. In my experience so far, I am very impressed with it.



Sean Durity – Staff Systems Engineer, Cassandra

-Original Message-
From: p...@xvalheru.org 
Sent: Saturday, August 3, 2019 6:06 AM
To: user@cassandra.apache.org
Cc: Dimo Velev 
Subject: [EXTERNAL] Re: loading big amount of data to Cassandra

Thanks to all,

I'll try the SSTables.

Thanks

Pat

On 2019-08-03 09:54, Dimo Velev wrote:
> Check out the CQLSSTableWriter java class -
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_trunk_src_java_org_apache_cassandra_io_sstable_CQLSSTableWriter.java=DwIDaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA=F43aPz7NPfAfs5c_oRJQvUiTMJjDmpB_BXAHKhPfW2A=
> . You use it to generate sstables - you need to write a small program
> for that. You can then stream them over the network using the
> sstableloader (either use the utility or use the underlying classes to
> embed it in your program).
>
> On 3. Aug 2019, at 07:17, Ayub M  wrote:
>
>> Dimo, how do you generate sstables? Do you mean load data locally on
>> a cassandra node and use sstableloader?
>>
>> On Fri, Aug 2, 2019, 5:48 PM Dimo Velev 
>> wrote:
>>
>>> Hi,
>>>
>>> Batches will actually slow down the process because they mean a
>>> different thing in C* - as you read they are just grouping changes
>>> together that you want executed atomically.
>>>
>>> Cassandra does not really have indices so that is different than a
>>> relational DB. However, after writing stuff to Cassandra it
>>> generates many smallish partitions of the data. These are then
>>> joined in the background together to improve read performance.
>>>
>>> You have two options from my experience:
>>>
>>> Option 1: use normal CQL api in async mode. This will create a
>>> high CPU load on your cluster. Depending on whether that is fine
>>> for you that might be the easiest solution.
>>>
>>> Option 2: generate sstables locally and use the sstableloader to
>>> upload them into the cluster. The streaming does not generate high
>>> cpu load so it is a viable option for clusters with other
>>> operational load.
>>>
>>> Option 2 scales with the number of cores of the machine generating
>>> the sstables. If you can split your data you can generate sstables
>>> on multiple machines. In contrast, option 1 scales with your
>>> cluster. If you have a large cluster that is idling, it would be
>>> better to use option 1.
>>>
>>> With both options I was able to write at about 50-100K rows / sec
>>> on my laptop and local Cassandra. The speed heavily depends on the
>>> size of your rows.
>>>
>>> Back to your question — I guess option2 is similar to what you
>>> are used to from tools like sqlloader for relational DBMSes
>>>
>>> I had a requirement of loading a few 100 mio rows per day into an
>>> operational cluster so I went with option 2 to offload the cpu
>>> load to reduce impact on the reading side during the loads.
>>>
>>> Cheers,
>>> Dimo
>>>
>>> Sent from my iPad
>>>
 On 2. Aug 2019, at 18:59, p...@xvalheru.org wrote:

 Hi,

 I need to upload to Cassandra about 7 billions of records. What
>>> is the best setup of Cassandra for this task? Will usage of batch
>>> speeds up the upload (I've read somewhere that batch in Cassandra
>>> is dedicated to atomicity not to speeding up communication)? How
>>> Cassandra internally works related to indexing? In SQL databases
>>> when uploading such amount of data is suggested to turn off
>>> indexing and then turn on. Is something simmillar possible in
>>> Cassandra?

 Thanks for all suggestions.

 Pat

 
 Freehosting PIPNI - 
 https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_=DwIDaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U=



>>>
>>
> -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: user-h...@cassandra.apache.org

>>>
>>>
>>
> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>
> ---
>
> Freehosting PIPNI - 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pipni.cz_=DwIDaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=0F8VMU_BKNwicZFDQ0Nx54JvvS3MHT92_W1RRwF3deA=nccgCDZwHe3qri11l3VV1if5GR1iqcWR5gjf6-J1C5U=


Freehosting PIPNI - 

COPY command giving inconsistent output

2019-08-05 Thread rob.hofmann
Hi all,

We’ve been running a production grade cluster with 21 nodes for the past few 
weeks. We use the COPY  TO  method to export our data.

The thing we are seeing, is that this gives an inconsistent amount of records. 
On day 1 we get 1.1mil records and the next 900k. On day 3 we see 1mil and so 
on. Are we missing something or is this a bug? This is a keyspace which doesnt 
get written to. There are only reads.

Also we are seeing this on all keyspaces, not just one. We are also able to 
reproduce this on our acceptance environment.

Smaller tables seem to give a consistent result. Larger ones give inconsistent 
results.

Thank you in advance. Looking forward to your replies.

Kind regards,
Rob Hofmann
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


Re: Cassandra read requests not getting timeout

2019-08-05 Thread Oleksandr Shulgin
On Mon, Aug 5, 2019 at 8:50 AM nokia ceph  wrote:

> Hi Community,
>
> I am using Cassanadra 3.0.13 . 5 node cluster simple topology. Following
> are the timeout  parameters in yaml file:
>
> # grep timeout /etc/cassandra/conf/cassandra.yaml
> cas_contention_timeout_in_ms: 1000
> counter_write_request_timeout_in_ms: 5000
> cross_node_timeout: false
> range_request_timeout_in_ms: 1
> read_request_timeout_in_ms: 1
> request_timeout_in_ms: 1
> truncate_request_timeout_in_ms: 6
> write_request_timeout_in_ms: 2000
>
> i'm trying a cassandra query using cqlsh and it is not getting timeout.
>
> #time cqlsh 10.50.11.11 -e "CONSISTENCY QUORUM; select
> asset_name,profile_name,job_index,active,last_valid_op,last_valid_op_ts,status,status_description,live_depth,asset_type,dest_path,source_docroot_name,source_asset_name,start_time,end_time,iptv,drm,geo,last_gc
> from cdvr.jobs where model_type ='asset' AND docroot_name='vx030'
>  LIMIT 10 ALLOW FILTERING;"
> Consistency level set to QUORUM.
> ()
> ()
> (79024 rows)
>
> real16m30.488s
> user0m39.761s
> sys 0m3.896s
>
> The query took 16.5 minutes  to display the output. But my
> read_request_timeout is 10 seconds. why the query doesn't got timeout after
> 10 s ??
>

Hi Renoy,

Have you tried the same query with enabling TRACING beforehand?
https://docs.datastax.com/en/archived/cql/3.3/cql/cql_reference/cqlshTracing.html

It doesn't sound all too likely that it has taken the client 16 minutes to
display the resultset, but this is definitely not included in the request
timeout from the server point of view.

Cheers,
--
Alex


Cassandra read requests not getting timeout

2019-08-05 Thread nokia ceph
Hi Community,

I am using Cassanadra 3.0.13 . 5 node cluster simple topology. Following
are the timeout  parameters in yaml file:

# grep timeout /etc/cassandra/conf/cassandra.yaml
cas_contention_timeout_in_ms: 1000
counter_write_request_timeout_in_ms: 5000
cross_node_timeout: false
range_request_timeout_in_ms: 1
read_request_timeout_in_ms: 1
request_timeout_in_ms: 1
truncate_request_timeout_in_ms: 6
write_request_timeout_in_ms: 2000

i'm trying a cassandra query using cqlsh and it is not getting timeout.

#time cqlsh 10.50.11.11 -e "CONSISTENCY QUORUM; select
asset_name,profile_name,job_index,active,last_valid_op,last_valid_op_ts,status,status_description,live_depth,asset_type,dest_path,source_docroot_name,source_asset_name,start_time,end_time,iptv,drm,geo,last_gc
from cdvr.jobs where model_type ='asset' AND docroot_name='vx030'
 LIMIT 10 ALLOW FILTERING;"
Consistency level set to QUORUM.
()
()
(79024 rows)

real16m30.488s
user0m39.761s
sys 0m3.896s

The query took 16.5 minutes  to display the output. But my
read_request_timeout is 10 seconds. why the query doesn't got timeout after
10 s ??

Regards,
Renoy