from:"Jens Rantil"

Re: cold vs hot data

2018-09-20 Thread Jens Rantil

I guess also OS-level page cache also will help out implicitly to make sure
your common pages aren't touching disk.

On Fri, Sep 14, 2018 at 2:46 AM Alaa Zubaidi (PDF) 
wrote:

> Hi,
>
> We are using Apache Cassandra 3.11.2 on RedHat 7
> The data can grow to +100TB however the hot data will be in most cases
> less than 10TB but we still need to keep the rest of data accessible.
> Anyone has this problem?
> What is the best way to make the cluster more efficient?
> Is there a way to somehow automatically move the old data to different
> storage (rack, dc, etc)?
> Any ideas?
>
> Regards,
>
> --
>
> Alaa
>
>
> *This message may contain confidential and privileged information. If it
> has been sent to you in error, please reply to advise the sender of the
> error and then immediately permanently delete it and all attachments to it
> from your systems. If you are not the intended recipient, do not read,
> copy, disclose or otherwise use this message or any attachments to it. The
> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
> all incoming e-mails sent to PDF e-mail accounts will be archived and may
> be scanned by us and/or by external service providers to detect and prevent
> threats to our systems, investigate illegal or inappropriate behavior,
> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
> concerns about this process, please contact us at *
> *legal.departm...@pdf.com* *.*



-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Current active queries and status/type

2018-03-06 Thread Jens Rantil

You can do sampling of tracing on a table to avoid some of the overhead.

On Fri, Mar 2, 2018, 00:23 D. Salvatore <dd.salvat...@gmail.com> wrote:

> Hi Nicolas,
> Thank you very much for the response.
> I am looking into something with a smaller time frame than a minute.
> Tracing is a good way to get these information but it introduces a huge
> overhead in the system that I'd like to avoid it.
>
> Thanks
> Salvatore
>
> 2018-03-01 15:08 GMT+00:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>:
>
>> Hi,
>>
>> With
>> org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency and
>> OneMinuteRate you can have such a metrics
>>
>> As for the state of the request with regards to other node I do no think
>> you can have that IMHO with JMX  (this is available using TRACING per
>> request)
>>
>>
>> On 1 March 2018 at 15:50, D. Salvatore <dd.salvat...@gmail.com> wrote:
>>
>>> Hello!
>>> There is any way to know how many queries a node is currently serving
>>> through JMX(or other tools)? And the state of the request so, for example,
>>> if the request is waiting for data from another node?
>>>
>>> Thanks
>>> Salvatore
>>>
>>
>>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: One time major deletion/purge vs periodic deletion

2018-03-06 Thread Jens Rantil

Sounds like you are using Cassandra as a queue. It's an antibiotic pattern.
What I would do would be to rely on TTL for removal of data and use the
TWCS compaction strategy to handle removal and you just focus on insertion.

On Tue, Mar 6, 2018, 07:39 Charulata Sharma (charshar) <chars...@cisco.com>
wrote:

> Hi,
>
>
>
>   Wanted the community’s feedback on deciding the schedule of Archive
> and Purge job.
>
> Is it better to Purge a large volume of data at regular intervals (like
> run A jobs once in 3 months ) or purge smaller amounts more frequently
> (run the job weekly??)
>
>
>
> Some estimates on the number of deletes performed would be…upto 80-90K
>  rows purged in 3 months vs 10K deletes every week ??
>
>
>
> Thanks,
>
> Charu
>
>
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Multiple nodes decommission

2017-04-16 Thread Jens Rantil

AFAIK, the fastest way to add multiple nodes is to make sure your clients
are only reading/writing to/from your current datacenter, create a new
datacenter with replication 0, add nodes to the new datacenter, increase
replication factor of the new datacenter, do `nodetool bootstrap` on all
nodes on new datacenter, point your clients to the new DC and finally
decommision the old one. I've done that multiple times and it's been much
faster than adding a few nodes. Obviously, this depends on how much data
you have...

/J

On Sat, Apr 15, 2017 at 10:19 AM, Vlad <qa23d-...@yahoo.com> wrote:

> *>range reassignments which becomes effective after a successful
> decommission.*
>
> But during leaving nodes announce themselves as "leaving". Do other
> leaving nodes taking this into account and not stream data to them?
> (applicable also for joining). I hope so ))
>
> I guess problem with sequential adding/removing nodes is data
> overstreaming and non-even load distribution. I mean if we have three racks
> it's better to add/remove by three nodes (one in each rack) and to avoid
> state with four nodes, for example.
>
> Any thoughts?
>
>
> On Tuesday, April 11, 2017 7:55 PM, benjamin roth <brs...@gmail.com>
> wrote:
>
>
> I did not test it but I'd bet that parallel decommision will lead to
> inconsistencies.
> Each decommission results in range movements and range reassignments which
> becomes effective after a successful decommission.
> If you start several decommissions at once, I guess the calculated
> reassignments are invalid for at least one node after the first node
> finished the decommission process.
>
> I hope someone will correct me if i am wrong.
>
> 2017-04-11 18:43 GMT+02:00 Jacob Shadix <jacobsha...@gmail.com>:
>
> Are you using vnodes? I typically do one-by-one as the decommission will
> create additional load/network activity streaming data to the other nodes
> as the token ranges are reassigned.
>
> -- Jacob Shadix
>
> On Sat, Apr 8, 2017 at 10:55 AM, Vlad <qa23d-...@yahoo.com> wrote:
>
> Hi,
>
> how multiple nodes should be decommissioned by "nodetool decommission"-
> one by one or in parallel ?
>
> Thanks.
>
>
>
>
>
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

`nodetool verify` outcome check

2017-01-03 Thread Jens Rantil

Hi,

We've had a discussion internally to start to run `nodetool verify`
periodically to test for bitrot. Does anyone know how I could check if the
verification failed or succeeded from, say, a script? Is there an error
exit code or some output I could grep for?

Thanks,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Backup restore with a different name

2016-11-02 Thread Jens Rantil

Bryan,

On Wed, Nov 2, 2016 at 11:38 AM, Bryan Cheng <br...@blockcypher.com> wrote:

> do you mean restoring the cluster to that state, or just exposing that
> state for reference while keeping the (corrupt) current state in the live
> cluster?


I mean "exposing that state for reference while keeping the (corrupt)
current state in the live cluster".

Cheers,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Backup restore with a different name

2016-11-02 Thread Jens Rantil

Thanks Anubhav,

Looks like a Java project without any documentation whatsoever ;) How do I
use the tool? What does it do?

Cheers,
Jens

On Wed, Nov 2, 2016 at 11:36 AM, Anubhav Kale <anubhav.k...@microsoft.com>
wrote:

> You would have to build some logic on top of what’s natively supported.
>
>
>
> Here is an option: https://github.com/anubhavkale/CassandraTools/
> tree/master/BackupRestore
>
>
>
>
>
> *From:* Jens Rantil [mailto:jens.ran...@tink.se]
> *Sent:* Wednesday, November 2, 2016 2:21 PM
> *To:* Cassandra Group <user@cassandra.apache.org>
> *Subject:* Backup restore with a different name
>
>
>
> Hi,
>
>
>
> Let's say I am periodically making snapshots of a table, say "users", for
> backup purposes. Let's say a developer makes a mistake and corrupts the
> table. Is there an easy way for me to restore a replica, say
> "users_20161102", of the original table for the developer to looks at the
> old copy?
>
>
>
> Cheers,
>
> Jens
>
>
>
> --
>
> Jens Rantil
>
> Backend engineer
>
> Tink AB
>
>
>
> Email: jens.ran...@tink.se
>
> Phone: +46 708 84 18 32
>
> Web: www.tink.se
> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tink.se%2F=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cb96c56bb385a4b769ef508d403662a7c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636137184724447397=Vg7mNwD7Wcvyui1HSugueVv4GAc7961mWXUYMR0cE%2B4%3D=0>
>
>
>
> Facebook
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.facebook.com%2F%23!%2Ftink.se=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cb96c56bb385a4b769ef508d403662a7c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636137184724457399=JD11Q5%2FlsE0nUZLoq%2FTI3tYKh3nZgNnlU8uCSBOJEOQ%3D=0>
>  Linkedin
> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fcompany%2F2735919%3Ftrk%3Dvsrp_companies_res_photo%26trkInfo%3DVSRPsearchId%253A1057023381369207406670%252CVSRPtargetId%253A2735919%252CVSRPcmpt%253Aprimary=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cb96c56bb385a4b769ef508d403662a7c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636137184724457399=z3yBN1NQpgEfjR2G5O4mwWE5GVw1ziIgj80v2%2FBIkB4%3D=0>
>  Twitter
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Ftink=02%7C01%7CAnubhav.Kale%40microsoft.com%7Cb96c56bb385a4b769ef508d403662a7c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636137184724457399=yo9FEAlRW6LdH5fsm4YdZoYLn6VvSt0h0iZCSEl2avY%3D=0>
>



-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Backup restore with a different name

2016-11-02 Thread Jens Rantil

Hi,

Let's say I am periodically making snapshots of a table, say "users", for
backup purposes. Let's say a developer makes a mistake and corrupts the
table. Is there an easy way for me to restore a replica, say
"users_20161102", of the original table for the developer to looks at the
old copy?

Cheers,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Cassandra Poor Read Performance Response Time

2016-11-02 Thread Jens Rantil

Hi,

I am by no means an expert on Cassandra, nor on
DateTieredCompactionStrategy. However, looking in "Query 2.xlsx" I see a
lot of

Partition index with 0 entries found for sstable 186

To me, that looks like Cassandra is looking at a lot of sstables and
realize too late that they don't contain any relevant data. Are you using
TTLs when you write data? Do the TTLs vary? If they do, there's a risk
Cassandra will have to inspect a lot of tables that turns out to hold
expired data. Also, have you checked `nodetool cfstats` and bloom filter
false positives?

Does `nodetool cfhistograms` give you any insights? I'm mostly thinking in
terms of unbalanced partition keys.

Have you checked the logs for how long GC pauses are being taken?

Somewhat implementation specific: Would adjusting the time bucket to a
smaller time resolution be an option?

Also, since you are using DateTieredCompactionStrategy, have you considered
using a TIMESTAMP constraint[1]? That might help you a lot actually.

[1] https://issues.apache.org/jira/browse/CASSANDRA-5514

Cheers,
Jens

On Mon, Oct 31, 2016 at 11:10 PM, _ _ <rage...@hotmail.com> wrote:

> Hi
>
> Currently i am running a cassandra cluster of 3 nodes (with it replicating
> to both nodes) and am experiencing poor performance, usually getting second
> response times when running queries when i am expecting/needing millisecond
> response times. Currently i have a table which looks like:
>
> CREATE TABLE tracker.all_ad_impressions_counter_1d (
> time_bucket bigint,
> ad_id text,
> uc text,
> count counter,
> PRIMARY KEY ((time_bucket, ad_id), uc)
> ) WITH CLUSTERING ORDER BY (uc ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'base_time_seconds': '3600', 'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
> 'max_sstable_age_days': '30', 'max_threshold': '32', 'min_threshold': '4',
> 'timestamp_resolution': 'MILLISECONDS'}
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
>
> and queries which look like:
>
> SELECT
> time_bucket,
> uc,
> count
> FROM
> all_ad_impressions_counter_1d
>
> WHERE ad_id = ?
> AND time_bucket = ?
>
> the cluster is running on servers with 16 GB RAM, and 4 CPU cores and 3
> 100GB datastores, the storage is not local and these VMs are being managed
> through openstack. There are roughly 200 million records being written per
> day (1 time_bucket) and maybe a few thousand records per partition
> (time_bucket, ad_id) at most. The amount of writes is not having a
> significant effect on our read performance as when writes are stopped, the
> read response time does not improve noticeably. I have attached a trace of
> one query i ran which took around 3 seconds which i would expect to take
> well below a second. I have also included the cassandra.yaml file and jvm
> options file. We do intend to change the storage to local storage and
> expect this will have a significant impact but i was wondering if there's
> anything else which could be changed which will also have a significant
> impact on read performance?
>
> Thanks
> Ian
>
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Does anyone store larger values in Cassandra E.g. 500 KB?

2016-10-24 Thread Jens Rantil

If I would do this, I would have have two tables; chunks and data:

CREATE TABLE file_chunks {
  filename string,
  chunk int,
  size int, // Optional if you want to query the total size of a file.
  PRIMARY KEY (filename, chunk)
}

CREATE TABLE chunks {
  filename string,
  chunk int,
  data blob,
  PRIMARY KEY ((filename, chunk))
}

By keeping the data chunks in a separate table, you'd make sure to spread
the data more evenly across the cluster. If the size of the files differ in
a size this is a much better approach. Also, using `(filename, chunk)` in
`data` table makes it possible for you do have a background process that
makes sure to delete rows in `chunks` that no longer exist in `file_chunks`.

Jens

On Friday, October 21, 2016, jason zhao yang <zhaoyangsingap...@gmail.com>
wrote:

> 1. usually before storing object, serialization is needed, so we can know
> the size.
> 2. add "chunk id" as last clustering key.
>
> Vikas Jaiman <er.vikasjai...@gmail.com
> <javascript:_e(%7B%7D,'cvml','er.vikasjai...@gmail.com');>>于2016年10月21日周五
> 下午11:46写道：
>
>> Thanks for your answer but I am just curious about:
>>
>> i)How do you identify the size of the object which you are going to chunk?
>>
>> ii) While reading or updating how it is going to read all those chunks?
>>
>> Vikas
>>
>> On Thu, Oct 20, 2016 at 9:25 PM, Justin Cameron <jus...@instaclustr.com
>> <javascript:_e(%7B%7D,'cvml','jus...@instaclustr.com');>> wrote:
>>
>>> You can, but it is not really very efficient or cost-effective. You may
>>> encounter issues with streaming, repairs and compaction if you have very
>>> large blobs (100MB+), so try to keep them under 10MB if possible.
>>>
>>> I'd suggest storing blobs in something like Amazon S3 and keeping just
>>> the bucket name & blob id in Cassandra.
>>>
>>> On Thu, 20 Oct 2016 at 12:03 Vikas Jaiman <er.vikasjai...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','er.vikasjai...@gmail.com');>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Normally people would like to store smaller values in Cassandra. Is
>>>> there anyone using it to store for larger values (e.g 500KB or more) and if
>>>> so what are the issues you are facing . I Would like to know the tweaks
>>>> also which you are considering.
>>>>
>>>> Thanks,
>>>> Vikas
>>>>
>>> --
>>>
>>> Justin Cameron
>>>
>>> Senior Software Engineer | Instaclustr
>>>
>>>
>>>
>>>
>>> This email has been sent on behalf of Instaclustr Pty Ltd (Australia)
>>> and Instaclustr Inc (USA).
>>>
>>> This email and any attachments may contain confidential and legally
>>> privileged information.  If you are not the intended recipient, do not copy
>>> or disclose its content, but please reply to this email immediately and
>>> highlight the error to the sender and then immediately delete the message.
>>>
>>>
>>
>>
>> --
>>
>

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: understanding partitions and # of nodes

2016-09-22 Thread Jens Rantil

By "partitions" I assume you refer to "partition keys".

Generally, the more partitions keys, the better. Having more partition keys
means your data generally is spread out more evenly across the cluster,
makes repairs run faster (or so I've heard), makes adding new nodes more
smooth, and makes it less likely that you are at hitting tombstone limits.

Also, 100 partition keys in a Cassandra table is nothing. If you don't have
more partition keys than that, Cassandra might not be the right fit.

Cheers,
Jens

On Wednesday, September 21, 2016, S Ahmed <sahmed1...@gmail.com> wrote:

> Hello,
>
> If you have a 10 node cluster, how does having 10 partitions or 100
> partitions change how cassandra will perform?
>
> With 10 partitions you will have 1 partition per node.
> WIth 100 partitions you will have 10 partitions per node.
>
> With 100 partitions I guess it helps because when you add more nodes to
> your cluster, the data can be redistributed since you have more nodes.
>
> What else are things to consider?
>
> Thanks.
>

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Nodetool repair

2016-09-20 Thread Jens Rantil

On Mon, Sep 19, 2016 at 3:07 PM Alain RODRIGUEZ <arodr...@gmail.com> wrote:

...

> - The size of your data
> - The number of vnodes
> - The compaction throughput
> - The streaming throughput
> - The hardware available
> - The load of the cluster
> - ...
>

I've also heard that the number of clustering keys per partition key could
have an impact. Might be worth investigating.

Cheers,
Jens
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Nodetool repair

2016-09-19 Thread Jens Rantil

Hi Lokesh,

Which version of Cassandra are you using? Which compaction strategy are you
using?

AFAIK, a repair doesn't trigger a major compaction, but I might be wrong
here.

What you could do is to run a repair for a subset of the ring (see `-st`
and `-et` `nodetool repair` parameters). If you repair 1/1000 or the ring,
repairing the whole ring will take ~1000 longer than your sample.

Also, you might want to look at incremental repairs.

If you kill the process in the middle the repair will not start again. You
will need to reissue it.

Cheers,
Jens

On Sun, Sep 18, 2016 at 2:58 PM Lokesh Shrivastava <
lokesh.shrivast...@gmail.com> wrote:

> Hi,
>
> I tried to run nodetool repair command on one of my keyspaces and found
> that it took lot more time than I anticipated. Is there a way to know in
> advance the ETA of manual repair before triggering it? I believe repair
> performs following operations -
>
> 1) Major compaction
> 2) Exchange of merkle trees with neighbouring nodes.
>
> Is there any other operation performed during manual repair? What if I
> kill the process in the middle?
>
> Thanks.
> Lokesh
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: How Fast Does Information Spread With Gossip?

2016-09-16 Thread Jens Rantil

> Is a minute a reasonable upper bound for most clusters?

I have no numbers and I'm sure this differs depending on how large your
cluster is. We have a small cluster of around 12 nodes and I statuses
generally propagate in under 5 seconds for sure. So, it will definitely be
less than 1 minute.

Cheers,
Jens

On Wed, Sep 14, 2016 at 8:49 PM jerome <jeromefroel...@hotmail.com> wrote:

> Hi,
>
>
> I was curious if anyone had any kind of statistics or ballpark figures on
> how long it takes information to propagate through a cluster with Gossip?
> I'm particularly interested in how fast information about the liveness of a
> node spreads. For example, in an n-node cluster the median amount of time
> it takes for all nodes to learn that a node went down is f(n) seconds. Is a
> minute a reasonable upper bound for most clusters? Too high, too low?
>
>
> Thanks,
>
> Jerome
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Maximum number of columns in a table

2016-09-16 Thread Jens Rantil

l used in rdbms. But I
>>>>>>> need rows together to work with them (indexing etc).
>>>>>>>
>>>>>>> @sfespace
>>>>>>> The map is needed when you have a dynamic schema. I don't have a
>>>>>>> dynamic schema (may have, and will use the map if I do). I just have
>>>>>>> thousands of schemas. One user needs 10 integers, while another user 
>>>>>>> needs
>>>>>>> 20 booleans, and another needs 30 integers, or a combination of them 
>>>>>>> all.
>>>>>>>
>>>>>>> On Thu, Sep 15, 2016 at 7:46 PM, DuyHai Doan <doanduy...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> "Another possible alternative is to use a single map column"
>>>>>>>>
>>>>>>>> --> how do you manage the different types then ? Because maps in
>>>>>>>> Cassandra are strongly typed
>>>>>>>>
>>>>>>>> Unless you set the type of map value to blob, in this case you
>>>>>>>> might as well store all the object as a single blob column
>>>>>>>>
>>>>>>>> On Thu, Sep 15, 2016 at 6:13 PM, sfesc...@gmail.com <
>>>>>>>> sfesc...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Another possible alternative is to use a single map column.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Sep 15, 2016 at 7:19 AM Dorian Hoxha <
>>>>>>>>> dorian.ho...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Since I will only have 1 table with that many columns, and the
>>>>>>>>>> other tables will be "normal" tables with max 30 columns, and the 
>>>>>>>>>> memory of
>>>>>>>>>> 2K columns won't be that big, I'm gonna guess I'll be fine.
>>>>>>>>>>
>>>>>>>>>> The data model is too dynamic, the alternative would be to create
>>>>>>>>>> a table for each user which will have even more overhead since the 
>>>>>>>>>> number
>>>>>>>>>> of users is in the several thousands/millions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 15, 2016 at 3:04 PM, DuyHai Doan <
>>>>>>>>>> doanduy...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> There is no real limit in term of number of columns in a table,
>>>>>>>>>>> I would say that the impact of having a lot of columns is the 
>>>>>>>>>>> amount of
>>>>>>>>>>> meta data C* needs to keep in memory for encoding/decoding each row.
>>>>>>>>>>>
>>>>>>>>>>> Now, if you have a table with 1000+ columns, the problem is
>>>>>>>>>>> probably your data model...
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 15, 2016 at 2:59 PM, Dorian Hoxha <
>>>>>>>>>>> dorian.ho...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Is there alot of overhead with having a big number of columns
>>>>>>>>>>>> in a table ? Not unbounded, but say, would 2000 be a problem(I 
>>>>>>>>>>>> think that's
>>>>>>>>>>>> the maximum I'll need) ?
>>>>>>>>>>>>
>>>>>>>>>>>> Thank You
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>>
>>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Is to ok restart DECOMMISION

2016-09-16 Thread Jens Rantil

Also have a look at `nodetool netstats` to check if streaming is
progressing or is halted.

Cheers,
Jens

On Fri, Sep 16, 2016 at 3:18 AM Mark Rose <markr...@markrose.ca> wrote:

> I've done that several times. Kill the process, restart it, let it
> sync, decommission.
>
> You'll need enough space on the receiving nodes for the full set of
> data, on top of the other data that was already sent earlier, plus
> room to cleanup/compact it.
>
> Before you kill, check system.log to see if it died on anything. If
> so, the decommission process will never finish. If not, let it
> continue. Of particular note is that by default transferring large
> sstables will timeout. You can fix that by adjusting
> streaming_socket_timeout_in_ms to a sufficiently large value (I set it
> to a day).
>
> -Mark
>
> On Thu, Sep 15, 2016 at 9:28 AM, laxmikanth sadula
> <laxmikanth...@gmail.com> wrote:
> > I started decommssioned a node in our cassandra cluster.
> > But its taking too long time (more than 12 hrs) , so I would like to
> > restart(stop/kill the node & restart 'node decommission' again)..
> >
> > Does killing node/stopping decommission and restarting decommission will
> > cause any issues to cluster?
> >
> > Using c*-2.0.17 , 2 Data centers, each DC with 3 groups each , each group
> > with 3 nodes with RF-3
> >
> > --
> > Thanks...!
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: [ANNOUNCEMENT] Website update

2016-09-12 Thread Jens Rantil

Are there equivalent JIRAs for the TODOs somewhere?

Jens

On Mon, Sep 12, 2016 at 9:58 AM Brice Dutheil <brice.duth...@gmail.com>
wrote:

> Really nice update !
>
> There's still some todos ;)
> http://cassandra.apache.org/doc/latest/architecture/storage_engine.html
> http://cassandra.apache.org/doc/latest/architecture/guarantees.html
> http://cassandra.apache.org/doc/latest/operating/read_repair.html
> ...
>
>
>
> -- Brice
>
> On Mon, Sep 12, 2016 at 6:38 AM, Ashish Disawal <
> ashish.disa...@evivehealth.com> wrote:
>
>> Website looks great.
>> Good job guys.
>>
>> --
>> Ashish Disawal
>>
>> On Mon, Sep 12, 2016 at 3:00 AM, Jens Rantil <jens.ran...@tink.se> wrote:
>>
>>> Nice! The website also feels snappier!
>>>
>>>
>>> On Friday, July 29, 2016, Sylvain Lebresne <sylv...@datastax.com> wrote:
>>>
>>>> Wanted to let everyone know that if you go to the Cassandra website
>>>> (cassandra.apache.org), you'll notice that there has been some change.
>>>> Outside
>>>> of a face lift, the main change is a much improved documentation section
>>>> (http://cassandra.apache.org/doc/). As indicated, that documentation
>>>> is a
>>>> work-in-progress and still has a few missing section. That
>>>> documentation is
>>>> maintained in-tree and contributions (through JIRA as any other
>>>> contribution)
>>>> is more than welcome.
>>>>
>>>> Best,
>>>> On behalf of the Apache Cassandra developers.
>>>>
>>>
>>>
>>> --
>>> Jens Rantil
>>> Backend engineer
>>> Tink AB
>>>
>>> Email: jens.ran...@tink.se
>>> Phone: +46 708 84 18 32
>>> Web: www.tink.se
>>>
>>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>>  Twitter <https://twitter.com/tink>
>>>
>>>
>>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Cassandra and Kubernetes and scaling

2016-09-12 Thread Jens Rantil

David,

Were you the one who wrote the article? I just finished reading it. It's
excellent! I'm also excited that running mutable infrastructure on
containers is maturing. I have a few specific questions you (or someone
else!) might be able to answer.

1. In the article you state

> We deployed 1,009 minion nodes to Google Compute Engine
<https://cloud.google.com/compute/> (GCE), spread across 4 zones, running a
custom version of the Kubernetes 1.3 beta.

Did you deploy a custom Kubernetes on GCE because 1.3 wasn't available? Or
was that because Pet Sets alpha feature was disabled on Google Cloud
Platform's hosted Kubernetes[1]?

[1] http://serverfault.com/q/802437/37237

2. The article stated

> Yes we deployed 1,000 pets, but one really did not want to join the party!

Do you have any speculation why this happened? By default Cassandra doesn't
allow concurrent nodes joining the cluster, but Pet Sets are added serially
by definition, right?

3. The article doesn't mention downscaling. Do you have any idea on how
that would/could be done? I consider myself a Kubernetes/container noob. It
there an equivalent of `readinessProbe` for shutting down containers? Or
would an external agent have to be deployed that orchestrates `nodetool
decommission`s an instance and then reduces the number of replicas by one
for the Pet Set?

4. For a smaller number of Cassandra nodes. Would you feel comfortable
running it on Kubernetes 1.3? ;)

Cheers,
Jens

On Monday, September 12, 2016, David Aronchick <aronch...@gmail.com> wrote:

> Please let me know if I can help at all!
>
> On Sun, Sep 11, 2016 at 2:55 PM, Jens Rantil <jens.ran...@tink.se
> <javascript:_e(%7B%7D,'cvml','jens.ran...@tink.se');>> wrote:
>
>> Hi Aiman,
>>
>> I noticed you never got any reply. This might be of interest:
>> http://blog.kubernetes.io/2016/07/thousand-instances-of-cassandra-using-
>> kubernetes-pet-set.html
>>
>> Cheers,
>> Jens
>>
>> On Tuesday, May 24, 2016, Aiman Parvaiz <ai...@flipagram.com
>> <javascript:_e(%7B%7D,'cvml','ai...@flipagram.com');>> wrote:
>>
>>> Looking forward to hearing from the community about this.
>>>
>>> Sent from my iPhone
>>>
>>> > On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz <m...@withkash.com>
>>> wrote:
>>> >
>>> > I saw a thread from April 2016 talking about Cassandra and Kubernetes,
>>> and have a few follow up questions.  It seems that especially after v1.2 of
>>> Kubernetes, and the upcoming 1.3 features, this would be a very viable
>>> option of running Cassandra on.
>>> >
>>> > My questions pertain to HostIds and Scaling Up/Down, and are related:
>>> >
>>> > 1.  If a container's host dies and is then brought up on another host,
>>> can you start up with the same PersistentVolume as the original container
>>> had?  Which begs the question would the new container get a new HostId,
>>> implying it would need to bootstrap into the environment?   If it's a
>>> bootstrap, does the old one get deco'd/assassinated?
>>> >
>>> > 2. Scaling up/down.  Scaling up would be relatively easy, as it should
>>> just kick off Bootstrapping the node into the cluster, but what if you need
>>> to scale down?  Would the Container get deco'd by the scaling down process?
>>> or just terminated, leaving you with potential missing replicas
>>> >
>>> > 3. Scaling up and increasing the RF of a particular keyspace, would
>>> there be a clean way to do this with the kubernetes tooling?
>>> >
>>> > In the end I'm wondering how much of the Kubernetes + Cassandra
>>> involves nodetool, and how much is just a Docker image where you need to
>>> manage that all yourself (painfully)
>>> >
>>> > --
>>> > --mike
>>>
>>
>>
>> --
>> Jens Rantil
>> Backend engineer
>> Tink AB
>>
>> Email: jens.ran...@tink.se
>> <javascript:_e(%7B%7D,'cvml','jens.ran...@tink.se');>
>> Phone: +46 708 84 18 32
>> Web: www.tink.se
>>
>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>  Twitter <https://twitter.com/tink>
>>
>>
>

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Cassandra and Kubernetes and scaling

2016-09-11 Thread Jens Rantil

Hi Aiman,

I noticed you never got any reply. This might be of interest:
http://blog.kubernetes.io/2016/07/thousand-instances-of-cassandra-using-kubernetes-pet-set.html

Cheers,
Jens

On Tuesday, May 24, 2016, Aiman Parvaiz <ai...@flipagram.com> wrote:

> Looking forward to hearing from the community about this.
>
> Sent from my iPhone
>
> > On May 24, 2016, at 10:19 AM, Mike Wojcikiewicz <m...@withkash.com
> <javascript:;>> wrote:
> >
> > I saw a thread from April 2016 talking about Cassandra and Kubernetes,
> and have a few follow up questions.  It seems that especially after v1.2 of
> Kubernetes, and the upcoming 1.3 features, this would be a very viable
> option of running Cassandra on.
> >
> > My questions pertain to HostIds and Scaling Up/Down, and are related:
> >
> > 1.  If a container's host dies and is then brought up on another host,
> can you start up with the same PersistentVolume as the original container
> had?  Which begs the question would the new container get a new HostId,
> implying it would need to bootstrap into the environment?   If it's a
> bootstrap, does the old one get deco'd/assassinated?
> >
> > 2. Scaling up/down.  Scaling up would be relatively easy, as it should
> just kick off Bootstrapping the node into the cluster, but what if you need
> to scale down?  Would the Container get deco'd by the scaling down process?
> or just terminated, leaving you with potential missing replicas
> >
> > 3. Scaling up and increasing the RF of a particular keyspace, would
> there be a clean way to do this with the kubernetes tooling?
> >
> > In the end I'm wondering how much of the Kubernetes + Cassandra involves
> nodetool, and how much is just a Docker image where you need to manage that
> all yourself (painfully)
> >
> > --
> > --mike
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Schema Disagreement vs Nodetool resetlocalschema

2016-09-11 Thread Jens Rantil

Hi Michael,

Did you ever get an answer on this? I'm curious to hear for future
reference.

Thanks,
Jens

On Monday, June 20, 2016, Michael Fong <michael.f...@ruckuswireless.com>
wrote:

> Hi,
>
>
>
> We have recently encountered several schema disagreement issue while
> upgrading Cassandra. In one of the cases, the 2-node cluster idled for over
> 30 minutes and their schema remain unsynced. Due to other logic flows,
> Cassandra cannot be restarted, and hence we need to come up an alternative
> on-the-fly. We are thinking to do a nodetool resetlocalschema to force the
> schema synchronization. How safe is this method? Do we need to disable
> thrift/gossip protocol before performing this function, and enable them
> back after resync completes?
>
>
>
> Thanks in advance!
>
>
>
> Sincerely,
>
>
>
> Michael Fong
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: [ANNOUNCEMENT] Website update

2016-09-11 Thread Jens Rantil

Nice! The website also feels snappier!

On Friday, July 29, 2016, Sylvain Lebresne <sylv...@datastax.com> wrote:

> Wanted to let everyone know that if you go to the Cassandra website
> (cassandra.apache.org), you'll notice that there has been some change.
> Outside
> of a face lift, the main change is a much improved documentation section
> (http://cassandra.apache.org/doc/). As indicated, that documentation is a
> work-in-progress and still has a few missing section. That documentation is
> maintained in-tree and contributions (through JIRA as any other
> contribution)
> is more than welcome.
>
> Best,
> On behalf of the Apache Cassandra developers.
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Bootstrapping multiple cassandra nodes simultaneously in existing dc

2016-09-11 Thread Jens Rantil

Yes. `nodetool setstreamthroughput` is your friend.

On Sunday, September 11, 2016, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> Make sure there is no spike in the load-avg on the existing nodes, as that
> might affect your application read request latencies.
>
> On Sun, Sep 11, 2016, 17:10 Jens Rantil <jens.ran...@tink.se
> <javascript:_e(%7B%7D,'cvml','jens.ran...@tink.se');>> wrote:
>
>> Hi Bhuvan,
>>
>> I have done such expansion multiple times and can really recommend
>> bootstrapping a new DC and pointing your clients to it. The process is so
>> much faster and the documentation you referred to has worked out fine for
>> me.
>>
>> Cheers,
>> Jens
>>
>>
>> On Sunday, September 11, 2016, Bhuvan Rawal <bhu1ra...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','bhu1ra...@gmail.com');>> wrote:
>>
>>> Hi,
>>>
>>> We are running Cassandra 3.6 and want to bump up Cassandra nodes in an
>>> existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to
>>> leverage more memory instead of m4.2xlarge). Bootstrapping a node would
>>> take 7-8 hours.
>>>
>>> If this activity is performed serially then it will take 5-6 days. I had
>>> a look at CASSANDRA-7069
>>> <https://issues.apache.org/jira/browse/CASSANDRA-7069> and a bit of
>>> discussion in the past at - http://grokbase.com/t/
>>> cassandra/user/147gcqvybg/adding-more-nodes-into-the-cluster. Wanted to
>>> know if the limitation is still applicable and race condition could occur
>>> in 3.6 version.
>>>
>>> If this is not the case can we add a new datacenter as mentioned here
>>> opsAddDCToCluster
>>> <https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsAddDCToCluster.html>
>>>  and
>>> bootstrap multiple nodes simultaneously by keeping auto_bootstrap false in
>>> cassandra.yaml and rebuilding nodes simultaneously in the new dc?
>>>
>>>
>>> Thanks & Regards,
>>> Bhuvan
>>>
>>
>>
>> --
>> Jens Rantil
>> Backend engineer
>> Tink AB
>>
>> Email: jens.ran...@tink.se
>> <javascript:_e(%7B%7D,'cvml','jens.ran...@tink.se');>
>> Phone: +46 708 84 18 32
>> Web: www.tink.se
>>
>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>  Twitter <https://twitter.com/tink>
>>
>>

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: large number of pending compactions, sstables steadily increasing

2016-09-11 Thread Jens Rantil

I just want to chime in and say that we also had issues keeping up with
compaction once (with vnodes/ssd disks) and I also want to recommend
keeping track of your open file limit which might bite you.

Cheers,
Jens

On Friday, August 19, 2016, Mark Rose <markr...@markrose.ca> wrote:

> Hi Ezra,
>
> Are you making frequent changes to your rows (including TTL'ed
> values), or mostly inserting new ones? If you're only inserting new
> data, it's probable using size-tiered compaction would work better for
> you. If you are TTL'ing whole rows, consider date-tiered.
>
> If leveled compaction is still the best strategy, one way to catch up
> with compactions is to have less data per partition -- in other words,
> use more machines. Leveled compaction is CPU expensive. You are CPU
> bottlenecked currently, or from the other perspective, you have too
> much data per node for leveled compaction.
>
> At this point, compaction is so far behind that you'll likely be
> getting high latency if you're reading old rows (since dozens to
> hundreds of uncompacted sstables will likely need to be checked for
> matching rows). You may be better off with size tiered compaction,
> even if it will mean always reading several sstables per read (higher
> latency than when leveled can keep up).
>
> How much data do you have per node? Do you update/insert to/delete
> rows? Do you TTL?
>
> Cheers,
> Mark
>
> On Wed, Aug 17, 2016 at 2:39 PM, Ezra Stuetzel <ezra.stuet...@riskiq.net
> <javascript:;>> wrote:
> > I have one node in my cluster 2.2.7 (just upgraded from 2.2.6 hoping to
> fix
> > issue) which seems to be stuck in a weird state -- with a large number of
> > pending compactions and sstables. The node is compacting about 500gb/day,
> > number of pending compactions is going up at about 50/day. It is at about
> > 2300 pending compactions now. I have tried increasing number of
> compaction
> > threads and the compaction throughput, which doesn't seem to help
> eliminate
> > the many pending compactions.
> >
> > I have tried running 'nodetool cleanup' and 'nodetool compact'. The
> latter
> > has fixed the issue in the past, but most recently I was getting OOM
> errors,
> > probably due to the large number of sstables. I upgraded to 2.2.7 and am
> no
> > longer getting OOM errors, but also it does not resolve the issue. I do
> see
> > this message in the logs:
> >
> >> INFO  [RMI TCP Connection(611)-10.9.2.218] 2016-08-17 01:50:01,985
> >> CompactionManager.java:610 - Cannot perform a full major compaction as
> >> repaired and unrepaired sstables cannot be compacted together. These
> two set
> >> of sstables will be compacted separately.
> >
> > Below are the 'nodetool tablestats' comparing a normal and the
> problematic
> > node. You can see problematic node has many many more sstables, and they
> are
> > all in level 1. What is the best way to fix this? Can I just delete those
> > sstables somehow then run a repair?
> >>
> >> Normal node
> >>>
> >>> keyspace: mykeyspace
> >>>
> >>> Read Count: 0
> >>>
> >>> Read Latency: NaN ms.
> >>>
> >>> Write Count: 31905656
> >>>
> >>> Write Latency: 0.051713177939359714 ms.
> >>>
> >>> Pending Flushes: 0
> >>>
> >>> Table: mytable
> >>>
> >>> SSTable count: 1908
> >>>
> >>> SSTables in each level: [11/4, 20/10, 213/100, 1356/1000, 306,
> 0,
> >>> 0, 0, 0]
> >>>
> >>> Space used (live): 301894591442
> >>>
> >>> Space used (total): 301894591442
> >>>
> >>>
> >>>
> >>> Problematic node
> >>>
> >>> Keyspace: mykeyspace
> >>>
> >>> Read Count: 0
> >>>
> >>> Read Latency: NaN ms.
> >>>
> >>> Write Count: 30520190
> >>>
> >>> Write Latency: 0.05171286705620116 ms.
> >>>
> >>> Pending Flushes: 0
> >>>
> >>> Table: mytable
> >>>
> >>> SSTable count: 14105
> >>>
> >>> SSTables in each level: [13039/4, 21/10, 206/100, 831, 0, 0, 0,
> >>> 0, 0]
> >>>
> >>> Space used (live): 561143255289
> >>>
> >>> Space used (total): 561143255289
> >
> > Thanks,
> >
> > Ezra
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Bootstrapping multiple cassandra nodes simultaneously in existing dc

2016-09-11 Thread Jens Rantil

Hi Bhuvan,

I have done such expansion multiple times and can really recommend
bootstrapping a new DC and pointing your clients to it. The process is so
much faster and the documentation you referred to has worked out fine for
me.

Cheers,
Jens

On Sunday, September 11, 2016, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> Hi,
>
> We are running Cassandra 3.6 and want to bump up Cassandra nodes in an
> existing datacenter from 3 to 12 (plan to move to r3.xlarge machines to
> leverage more memory instead of m4.2xlarge). Bootstrapping a node would
> take 7-8 hours.
>
> If this activity is performed serially then it will take 5-6 days. I had a
> look at CASSANDRA-7069
> <https://issues.apache.org/jira/browse/CASSANDRA-7069> and a bit of
> discussion in the past at - http://grokbase.com/t/
> cassandra/user/147gcqvybg/adding-more-nodes-into-the-cluster. Wanted to
> know if the limitation is still applicable and race condition could occur
> in 3.6 version.
>
> If this is not the case can we add a new datacenter as mentioned here
> opsAddDCToCluster
> <https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsAddDCToCluster.html>
>  and
> bootstrap multiple nodes simultaneously by keeping auto_bootstrap false in
> cassandra.yaml and rebuilding nodes simultaneously in the new dc?
>
>
> Thanks & Regards,
> Bhuvan
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Isolation in case of Single Partition Writes and Batching with LWT

2016-09-11 Thread Jens Rantil

Hi,

This might be off-topic, but you could always use Zookeeper locking and/or
Apache Kafka topic keys for doing things like this.

Cheers,
Jens

On Tuesday, September 6, 2016, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:

> Hi,
>
> We are working to solve on a multi threaded distributed design which in
> which a thread reads current state from Cassandra (Single partition ~ 20
> Rows), does some computation and saves it back in. But it needs to be
> ensured that in between reading and writing by that thread any other thread
> should not have saved any operation on that partition.
>
> We have thought of a solution for the same - *having a write_time column*
> in the schema and making it static. Every time the thread picks up a job
> read will be performed with LOCAL_QUORUM. While writing into Cassandra
> batch will contain a LWT (IF write_time is read time) otherwise read will
> be performed and computation will be done again and so on. This will ensure
> that while saving partition is in a state it was read from.
>
> In order to avoid race condition we need to ensure couple of things:
>
> 1. While saving data in a batch with a single partition (*Rows may be
> Updates, Deletes, Inserts)* are they Isolated per replica node. (Not
> necessarily on a cluster as a whole). Is there a possibility of client
> reading partial rows?
>
> 2. If we do a LOCAL_QUORUM read and LOCAL_QUORUM writes in this case could
> there a chance of inconsistency in this case (When LWT is being used in
> batches).
>
> 3. Is it possible to use multiple LWT in a single Batch? In general how
> does LWT performs with Batch and is Paxos acted on before batch execution?
>
> Can someone help us with this?
>
> Thanks & Regards,
> Bhuvan
>
>

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: Finding records that exist on Cassandra but not externally

2016-09-08 Thread Jens Rantil

Hi again Chris,

Another option would be to have a look at using a Merkle Tree to quickly
drill down to the differences. This is actually what Cassandra uses
internally when running a repair between different nodes.

Cheers,
Jens

On Wed, Sep 7, 2016 at 9:47 AM <ch...@cmartinit.co.uk> wrote:

> First off I hope this appropriate here- I couldn't decide whether this was
> a question for Cassandra users or spark users so if you think it's in the
> wiring place feel free to redirect me.
>
> I have a system that does a load of data manipulation using spark.  The
> output of this program is a effectively the new state that I want my
> Cassandra table to be in and the final step is to update Cassandra so that
> it matches this state.
>
> At present I'm currently inserting all rows in my generated state into
> Cassandra. This works for new rows and also for updating existing rows but
> doesn't of course delete any rows that were already in Cassandra but not in
> my new state.
>
> The problem I have now is how best to delete these missing rows. Options I
> have considered are:
>
> 1. Setting a ttl on inserts which is roughly the same as my data refresh
> period. This would probably be pretty performant but I really don't want to
> do this because it would mean that all data in my database would disappear
> if I had issues running my refresh task!
>
> 2. Every time I refresh the data I would first have to fetch all primary
> keys from Cassandra and, compare them to primary keys locally to create a
> list of pks to delete before the insert. This seems the most logicaly
> correct option but is going to result in reading vast amounts of data from
> Cassandra.
>
> 3. Truncating the entire table before refreshing Cassandra. This has the
> benefit of being pretty simple in code but I'm not sure of the performance
> implications of this and what will happen if I truncate while a node is
> offline.
>
> For reference the table is on the order of 10s of millions of rows and for
> any data refresh only a very small fraction (<.1%) will actually need
> deleting. 99% of the time I'll just be overwriting existing keys.
>
> I'd be grateful if anyone could shed some advice on the best solution here
> or whether there's some better way I haven't thought of.
>
> Thanks,
>
> Chris
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Finding records that exist on Cassandra but not externally

2016-09-07 Thread Jens Rantil

Hi Chris,

Without fully knowing your usecase; You can't keep track of which keys have
changed in the external system somehow? Otherwise 2) sounds like the way to
go to me.

Cheers,
Jens

On Wed, Sep 7, 2016 at 9:47 AM <ch...@cmartinit.co.uk> wrote:

> First off I hope this appropriate here- I couldn't decide whether this was
> a question for Cassandra users or spark users so if you think it's in the
> wiring place feel free to redirect me.
>
> I have a system that does a load of data manipulation using spark.  The
> output of this program is a effectively the new state that I want my
> Cassandra table to be in and the final step is to update Cassandra so that
> it matches this state.
>
> At present I'm currently inserting all rows in my generated state into
> Cassandra. This works for new rows and also for updating existing rows but
> doesn't of course delete any rows that were already in Cassandra but not in
> my new state.
>
> The problem I have now is how best to delete these missing rows. Options I
> have considered are:
>
> 1. Setting a ttl on inserts which is roughly the same as my data refresh
> period. This would probably be pretty performant but I really don't want to
> do this because it would mean that all data in my database would disappear
> if I had issues running my refresh task!
>
> 2. Every time I refresh the data I would first have to fetch all primary
> keys from Cassandra and, compare them to primary keys locally to create a
> list of pks to delete before the insert. This seems the most logicaly
> correct option but is going to result in reading vast amounts of data from
> Cassandra.
>
> 3. Truncating the entire table before refreshing Cassandra. This has the
> benefit of being pretty simple in code but I'm not sure of the performance
> implications of this and what will happen if I truncate while a node is
> offline.
>
> For reference the table is on the order of 10s of millions of rows and for
> any data refresh only a very small fraction (<.1%) will actually need
> deleting. 99% of the time I'll just be overwriting existing keys.
>
> I'd be grateful if anyone could shed some advice on the best solution here
> or whether there's some better way I haven't thought of.
>
> Thanks,
>
> Chris
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Ring connection timeouts with 2.2.6

2016-06-30 Thread Jens Rantil

Hi,

Could it be garbage collection occurring on nodes that are more heavily
loaded?

Cheers,
Jens

Den sön 26 juni 2016 05:22Mike Heffner <m...@librato.com> skrev:

> One thing to add, if we do a rolling restart of the ring the timeouts
> disappear entirely for several hours and performance returns to normal.
> It's as if something is leaking over time, but we haven't seen any
> noticeable change in heap.
>
> On Thu, Jun 23, 2016 at 10:38 AM, Mike Heffner <m...@librato.com> wrote:
>
>> Hi,
>>
>> We have a 12 node 2.2.6 ring running in AWS, single DC with RF=3, that is
>> sitting at <25% CPU, doing mostly writes, and not showing any particular
>> long GC times/pauses. By all observed metrics the ring is healthy and
>> performing well.
>>
>> However, we are noticing a pretty consistent number of connection
>> timeouts coming from the messaging service between various pairs of nodes
>> in the ring. The "Connection.TotalTimeouts" meter metric show 100k's of
>> timeouts per minute, usually between two pairs of nodes for several hours
>> at a time. It seems to occur for several hours at a time, then may stop or
>> move to other pairs of nodes in the ring. The metric
>> "Connection.SmallMessageDroppedTasks." will also grow for one pair of
>> the nodes in the TotalTimeouts metric.
>>
>> Looking at the debug log typically shows a large number of messages like
>> the following on one of the nodes:
>>
>> StorageProxy.java:1033 - Skipped writing hint for /172.26.33.177 (ttl 0)
>>
>> We have cross node timeouts enabled, but ntp is running on all nodes and
>> no node appears to have time drift.
>>
>> The network appears to be fine between nodes, with iperf tests showing
>> that we have a lot of headroom.
>>
>> Any thoughts on what to look for? Can we increase thread count/pool sizes
>> for the messaging service?
>>
>> Thanks,
>>
>> Mike
>>
>> --
>>
>>   Mike Heffner <m...@librato.com>
>>   Librato, Inc.
>>
>>
>
>
> --
>
>   Mike Heffner <m...@librato.com>
>   Librato, Inc.
>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: some questions

2016-06-30 Thread Jens Rantil

You forgot FROM in your CQL query.

Jens

Den sön 26 juni 2016 08:30lowping <lowp...@163.com> skrev:

> Hi :
>
>
> question 1:
>
> I got a error about this cql， have you fix it already ？？？
> select collection_type where id in (‘a’,’b’)
>
> question 2:
>
> I want use UDF in update, but this cql can’t execute.  have some advise？？？
>
> update table_name set field=my_function(field) where …
>
>
> tnk u so much
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Multi DC setup question

2016-06-30 Thread Jens Rantil

I'm AFK, but you might be able to query the system.peers table to see which
nodes are up.

Cheers,
Jens

Den tis 28 juni 2016 06:44Charulata Sharma (charshar) <chars...@cisco.com>
skrev:

> Hi All,
>
>We are setting up another Data Center and have the following
> question:
>
> 6 nodes in each DC Cassandra cluster.
>
> All key spaces have an RF of 3
>
> *Our scenario is *
>
>
>
> Apps node connect to Cassandra cluster using LOCAL_QUORUM consistency.
>
>
>
> We want to ensure that If 5 nodes out of the 6 are available then
> application enters the primary DC else the application URL be directed to
> another DC.
>
>
>
> What is the best option to achieve this??
>
>
>
> Thanks,
>
> Charu
>
>
>
>
>
>
>
>
>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Motivation for a DHT ring

2016-06-30 Thread Jens Rantil

Some reasons I can come up with:
- it would be hard to have tunable read/consistencies/replicas when
interfacing with a file system.
- data locality support would require strong coupling to the distributed
file system interface (if at all possible given that certain sstables
should live on the same data node).
- operator complexity both administering a distributed file system as well
as a Cassandra cluster. This was a personal reason why I chose Cassandra
instead of HBase for a project.

Cheers,
Jens

Den ons 29 juni 2016 13:01jean paul <researche...@gmail.com> skrev:

>
>
> 2016-06-28 22:29 GMT+01:00 jean paul <researche...@gmail.com>:
>
>> Hi all,
>>
>> Please, What is the motivation for choosing a DHT ring in cassandra? Why
>> not use a normal parallel or distributed file system that supports
>> replication?
>>
>> Thank you so much for clarification.
>>
>> Kind regards.
>>
>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: tuning repairs and compaction options

2016-05-06 Thread Jens Rantil

Hi Reik,

You could always throttle your repair by running smaller chunks of the
repair. See https://github.com/BrianGallew/cassandra_range_repair.

Regarding the compaction, you can always change the compactionthroughput
using `nodetool setcompactionthroughput`.

Hope this helps,
Jens

On Fri, May 6, 2016 at 9:47 AM Reik Schatz <reik.sch...@gmail.com> wrote:

> Hi, we are running a 9 node cluster under load. The nodes are running in
> EC2 on i2.2xlarge instances. Cassandra version is 2.2.4. One node was down
> yesterday for more than 3 hours. So we manually started an incremental
> repair this morning via nodetool (anti-entropy repair?)
>
> What we can see is that user CPU on that node goes up to over 95% and also
> goes up on all other nodes. Also the number of SSTables is exploding, I
> guess due to anticompaction.
>
> What are my tuning options to have a more gentle repair behaviour? Which
> settings should I look at if I want CPU to stay below 50% for instance. My
> worry is always to impact the read/write performance during times when we
> do anti-entropy repairs.
>
> Cheers,
> Reik
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: Alternative approach to setting up new DC

2016-04-21 Thread Jens Rantil

Hi,

I never got any response here, but just wanted to share that I went to a
Cassandra meet-up in Stockholm yesterday where I talked to two knowledgable
Cassandra people that verified that the approach below should work. The
most important thing is that the backup must be fully imported before
gc_grace_seconds after when the backup is taken.

As of me, I managed to a get a more stable VPN setup and did not have to go
down this path.

Cheers,
Jens

On Mon, Apr 18, 2016 at 10:15 AM Jens Rantil <jens.ran...@tink.se> wrote:

> Hi,
>
> I am provisioning a new datacenter for an existing cluster. A rather shaky
> VPN connection is hindering me from making a "nodetool rebuild" bootstrap
> on the new DC. Interestingly, I have a full fresh database snapshot/backup
> at the same location as the new DC (transferred outside of the VPN). I am
> now considering the following approach:
>
>1. Make sure my clients are using the old DC.
>2. Provision the new nodes in new DC.
>3. ALTER the keyspace to enable replicas on the new DC. This will
>start replicating all writes from old DC to new DC.
>4. Before gc_grace_seconds after operation 3) above, use sstableloader
>to stream my backup to the new nodes.
>5. For safety precaution, do a full repair.
>
> Could you see any issues with doing this?
>
> Cheers,
> Jens
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: When are hints written?

2016-04-21 Thread Jens Rantil

Hi again Bo,

I assume this is the piece of documentation you are referring to?
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_hh_c.html?scroll=concept_ds_ifg_jqx_zj__performance

> If a replica node is overloaded or unavailable, and the failure detector
has not yet marked it down, then expect most or all writes to that node to
fail after the timeout triggered by write_request_timeout_in_ms,
<http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__write_request_timeout_in_ms>
which defaults to 10 seconds. During that time, Cassandra writes the hint
when the timeout is reached.

I'm not an expert on this, but the way I've seen is that hints are written
stored as soon as there is _any_ issues writing a mutation
(insert/update/delete) to a node. By "issue", that essentially means that a
node hasn't acknowledged back to the coordinator that the write succeeded
within write_request_timeout_in_ms. This includes TCP/socket timeouts,
connection issues or that the node is down. The hints are stored for a
maximum timespan defaulting to 3 hours.

Cheers,
Jens

On Thu, Apr 21, 2016 at 8:06 AM Bo Finnerup Madsen <bo.gunder...@gmail.com>
wrote:

> Hi Jens,
>
> Thank you for the tip!
> ALL would definitely cure our hints issue, but as you note, it is not
> optimal as we are unable to take down nodes without clients failing.
>
> I am most probably overlooking something in the documentation, but I
> cannot see any description of when hints are written other than when a node
> is marked as being down. And since none of our nodes have been marked as
> being down (at least according to the logs), I suspect that there is some
> timeout that governs when hints are written?
>
> Regarding your other post: Yes, 3.0.3 is pretty new. But we are new to
> this cassandra game, and our schema-fu is not strong enough for us to
> create a schema without using materialized views :)
>
>
> ons. 20. apr. 2016 kl. 17.09 skrev Jens Rantil <jens.ran...@tink.se>:
>
>> Hi Bo,
>>
>> > In our case, I would like for the cluster to wait for the write to be
>> persisted on the relevant nodes before returning an ok to the client.
>> But I don't know which knobs to turn to accomplish this? or if it is even
>> possible :)
>>
>> This is what write consistency option is for. Have a look at
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html.
>> Note, however that if you use ALL, your clients will fail (throw exception,
>> depending on language) as soon as a single partition can't be written. This
>> means you can't do online maintenance of a Cassandra node (such as
>> upgrading it etc.) without experiencing write issues.
>>
>> Cheers,
>> Jens
>>
>> On Wed, Apr 20, 2016 at 3:39 PM Bo Finnerup Madsen <
>> bo.gunder...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a small 5 node cluster of m4.xlarge clients that receives writes
>>> from ~20 clients. The clients will write as fast as they can, and the whole
>>> process is limited by the write performance of the cassandra cluster.
>>> After we have tweaked our schema to avoid large partitions, the load is
>>> going ok and we don't see any warnings or errors in the cassandra logs. But
>>> we do see quite a lot of hint handoff activity. During the load, the
>>> cassandra nodes are quite loaded, with linux reporting a load as high as 20.
>>>
>>> I have read the available documentation on how hints works, and to my
>>> understanding hints should only be written if a node is down. But as far as
>>> I can see, none of the nodes are marked as down during the load. So I
>>> suspect I am missing something :)
>>> We have configured the servers with write_request_timeout_in_ms: 12
>>> and the clients with a timeout of 13, but still get hints stored.
>>>
>>> In our case, I would like for the cluster to wait for the write to be
>>> persisted on the relevant nodes before returning an ok to the client. But I
>>> don't know which knobs to turn to accomplish this? or if it is even
>>> possible :)
>>>
>>> We are running cassandra 3.0.3, with 8Gb heap and a replication factor
>>> of 3.
>>>
>>> Thank you in advance!
>>>
>>> Yours sincerely,
>>>   Bo Madsen
>>>
>> --
>>
>> Jens Rantil
>> Backend Developer @ Tink
>>
>> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
>> For urgent matters you can reach me at +46-708-84 18 32.
>>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: When are hints written?

2016-04-20 Thread Jens Rantil

Hi Bo,

> In our case, I would like for the cluster to wait for the write to be
persisted on the relevant nodes before returning an ok to the client. But I
don't know which knobs to turn to accomplish this? or if it is even
possible :)

This is what write consistency option is for. Have a look at
https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html.
Note, however that if you use ALL, your clients will fail (throw exception,
depending on language) as soon as a single partition can't be written. This
means you can't do online maintenance of a Cassandra node (such as
upgrading it etc.) without experiencing write issues.

Cheers,
Jens

On Wed, Apr 20, 2016 at 3:39 PM Bo Finnerup Madsen <bo.gunder...@gmail.com>
wrote:

> Hi,
>
> We have a small 5 node cluster of m4.xlarge clients that receives writes
> from ~20 clients. The clients will write as fast as they can, and the whole
> process is limited by the write performance of the cassandra cluster.
> After we have tweaked our schema to avoid large partitions, the load is
> going ok and we don't see any warnings or errors in the cassandra logs. But
> we do see quite a lot of hint handoff activity. During the load, the
> cassandra nodes are quite loaded, with linux reporting a load as high as 20.
>
> I have read the available documentation on how hints works, and to my
> understanding hints should only be written if a node is down. But as far as
> I can see, none of the nodes are marked as down during the load. So I
> suspect I am missing something :)
> We have configured the servers with write_request_timeout_in_ms: 12
> and the clients with a timeout of 13, but still get hints stored.
>
> In our case, I would like for the cluster to wait for the write to be
> persisted on the relevant nodes before returning an ok to the client. But I
> don't know which knobs to turn to accomplish this? or if it is even
> possible :)
>
> We are running cassandra 3.0.3, with 8Gb heap and a replication factor of
> 3.
>
> Thank you in advance!
>
> Yours sincerely,
>   Bo Madsen
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Alternative approach to setting up new DC

2016-04-18 Thread Jens Rantil

Hi,

I am provisioning a new datacenter for an existing cluster. A rather shaky
VPN connection is hindering me from making a "nodetool rebuild" bootstrap
on the new DC. Interestingly, I have a full fresh database snapshot/backup
at the same location as the new DC (transferred outside of the VPN). I am
now considering the following approach:

   1. Make sure my clients are using the old DC.
   2. Provision the new nodes in new DC.
   3. ALTER the keyspace to enable replicas on the new DC. This will start
   replicating all writes from old DC to new DC.
   4. Before gc_grace_seconds after operation 3) above, use sstableloader
   to stream my backup to the new nodes.
   5. For safety precaution, do a full repair.

Could you see any issues with doing this?

Cheers,
Jens
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Hanging pending compactions

2015-07-08 Thread Jens Rantil

Hi,

After executing `nodetool cleanup` on some nodes they are all showing lots
(123, 97, 64) of pending compaction tasks, but not a single active task.
I'm running Cassandra 2.0.14 with Leveled Compaction Strategy on most of
our tables. Anyone experienced this before? Also, is there any way for me
to extract debugging information to file a bug report before restarting the
nodes?

Cheers,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Consistent reads and first write wins

2015-07-08 Thread Jens Rantil

Hi John,

The general answer: Each cell in a CQL table has a corresponding timestamp
which is taken from the clock on the Cassandra node that orchestrates the
write. When you are reading from a Cassandra cluster the node that
coordinates the read will compare the timestamps of the values it fetches.
Last write(=highest timestamp) wins and will be returned to the client.

As you may now understand, the above is why it is crucial you NTP sync your
Cassandra nodes.

If time_uuid_1 comes before time_uuid_2 and if both clients follow up the
 writes with quorum reads, then will both clients see the value 'bar' for
 prop1?


As you might have understood by now, the values of your timeuuid aren't
really relevant here - the timestamp transparently taken from the clock of
the coordinating node is. This is because you could supply your own
timeuuid from the client, which might have a differing clock. However, it
will basically correspond to the timestamp if you use the helper function
`now()` in CQL.

Anyway, if you make a quorum write (that succeeds) and then make a
successful quorum read, you can be 100% that you will get the latest value.

Are there situations in which clients might see different values?


I can see three scenarios where that could happen:

   1. If you write with a weaker consistency such as ONE and read quorum.
   2. If you write with quorum and read with a weaker consistency such as
   ONE.
   3. If you make a quorum write that fails. That write might still have
   been applied to some node. Cassandra does not guarantee atomic writes (that
   is, either applied or not at all). In other words a failed write will not
   roll back partially applied writes in any way.

Cheers,
Jens

On Wed, Jul 8, 2015 at 3:35 AM, John Sanda john.sa...@gmail.com wrote:

 Suppose I have the following schema,

 CREATE TABLE foo (
 id text,
 time timeuuid,
 prop1 text,
 PRIMARY KEY (id, time)
 )
 WITHCLUSTERING ORDER BY (time ASC);

 And I have two clients who execute quorum writes, e.g.,

 // client 1
 INSERT INTO FOO (id, time, prop1) VALUES ('test', time_uuid_1, 'bar');

 // client 2
 INSERT INTO FOO (id, time, prop1) VALUES ('test', time_uuid_2, 'bam');

 If time_uuid_1 comes before time_uuid_2 and if both clients follow up the
 writes with quorum reads, then will both clients see the value 'bar' for
 prop1? Are there situations in which clients might see different values?


 --

 - John




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

RE: nodetool repair

2015-06-19 Thread Jens Rantil

Hi,


For the record I've succesfully used 
https://github.com/BrianGallew/cassandra_range_repair to make smooth repairing. 
Could maybe also be of interest don't know...




Cheers,

Jens





–
Skickat från Mailbox

On Fri, Jun 19, 2015 at 8:36 PM, null sean_r_dur...@homedepot.com wrote:

 It seems to me that running repair on any given node may also induce repairs 
 to related replica nodes. For example, if I run repair on node A and node B 
 has some replicas, data might stream from A to B (assuming A has newer/more 
 data). Now, that does NOT mean that node B will be fully repaired. You still 
 need to run repair -pr on all nodes before gc_grace_seconds.
 You can run repairs on multiple nodes at the same time. However, you might 
 end up with a large amount of streaming, if many repairs are needed. So, you 
 should be aware of a performance impact.
 I run weekly repairs on one node at a time, if possible. On, larger rings, 
 though, I run repairs on multiple nodes staggered by a few hours. Once your 
 routine maintenance is established, repairs will not run for very long. But, 
 if you have a large ring that hasn’t been repaired, those first repairs may 
 take days (but should get faster as you get further through the ring).
 Sean Durity
 From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
 Sent: Friday, June 19, 2015 3:56 AM
 To: user@cassandra.apache.org
 Subject: Re: nodetool repair
 Hi,
 This is not necessarily true. Repair will induce compactions only if you have 
 entropy in your cluster. If not it will just read your data to compare all 
 the replica of each piece of data (using indeed cpu and disk IO).
 If there is some data missing it will repair it. Though, due to merkle tree 
 size, you will generally stream more data than just the data needed. To limit 
 this downside and the compactions amount, use range repairs -- 
 http://www.datastax.com/dev/blog/advanced-repair-techniques.
 About tombstones, they will be evicted only after gc_grace_period and only if 
 all the parts of the row are part of the compaction.
 C*heers,
 Alain
 2015-06-19 9:08 GMT+02:00 arun sirimalla 
 arunsi...@gmail.commailto:arunsi...@gmail.com:
 Yes compactions will remove tombstones
 On Thu, Jun 18, 2015 at 11:46 PM, Jean Tremblay 
 jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
 wrote:
 Perfect thank you.
 So making a weekly nodetool repair -pr”  on all nodes one after the other 
 will repair my cluster. That is great.
 If it does a compaction, does it mean that it would also clean up my 
 tombstone from my LeveledCompactionStrategy tables at the same time?
 Thanks for your help.
 On 19 Jun 2015, at 07:56 , arun sirimalla 
 arunsi...@gmail.commailto:arunsi...@gmail.com wrote:
 Hi Jean,
 Running nodetool repair on a node will repair only that node in the cluster. 
 It is recommended to run nodetool repair on one node at a time.
 Few things to keep in mind while running repair
1. Running repair will trigger compactions
2. Increase in CPU utilization.
 Run node tool repair with -pr option, so that it will repair only the range 
 that node is responsible for.
 On Thu, Jun 18, 2015 at 10:50 PM, Jean Tremblay 
 jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
 wrote:
 Thanks Jonathan.
 But I need to know the following:
 If you issue a “nodetool repair” on one node will it repair all the nodes in 
 the cluster or only the one on which we issue the command?
 If it repairs only one node, do I have to wait that the nodetool repair ends, 
 and only then issue another “nodetool repair” on the next node?
 Kind regards
 On 18 Jun 2015, at 19:19 , Jonathan Haddad 
 j...@jonhaddad.commailto:j...@jonhaddad.com wrote:
 If you're using DSE, you can schedule it automatically using the repair 
 service.  If you're open source, check out Spotify cassandra reaper, it'll 
 manage it for you.
 https://github.com/spotify/cassandra-reaper
 On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay 
 jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com 
 wrote:
 Hi,
 I want to make on a regular base repairs on my cluster as suggested by the 
 documentation.
 I want to do this in a way that the cluster is still responding to read 
 requests.
 So I understand that I should not use the -par switch for that as it will do 
 the repair in parallel and consume all available resources.
 If you issue a “nodetool repair” on one node will it repair all the nodes in 
 the cluster or only the one on which we issue the command?
 If it repairs only one node, do I have to wait that the nodetool repair ends, 
 and only then issue another “nodetool repair” on the next node?
 If we had down time periods I would issue a nodetool -par, but we don’t have 
 down time periods.
 Sorry for the stupid questions.
 Thanks for your help.
 --
 Arun
 Senior Hadoop/Cassandra Engineer
 Cloudwick
 2014 Data Impact Award Winner (Cloudera)

Re: Question regarding concurrent bootstrapping

2015-06-14 Thread Jens Rantil

Rob,


Thanks for a great answer. While I'm at it, thanks for all the time you put 
into answering people on this mailing list. I'm sure I'm not the only 
appreciating it.




Cheers,

Jens





–
Skickat från Mailbox

On Sat, Jun 13, 2015 at 12:37 AM, Robert Coli rc...@eventbrite.com
wrote:

 On Fri, Jun 12, 2015 at 5:21 AM, Jens Rantil jens.ran...@tink.se wrote:
 Let's say I have an existing cluster and do the following:

1. I start a new joining node (A). It enters state Up/Joining.
Streaming automatically start to this node.
2. I wait two minutes (best practise for bootstrapping).
3. I start a second node (B) to join the cluster. It allocates some of
A:s previous parts of the ring and enters state Up/Joining. Streaming
automatically starts to this node.

 Will streaming of data that A is no longer responsible (after B joined)
 stop immediately? That is, after (3), will data streamed to A only be what
 it is responsible of?

 It depends on the version of Cassandra. A will get data it shouldn't get
 in any version that doesn't contain CASSANDRA-2434 patch. If you do not run
 cleanup on A when A is done bootstrapping
 In a version containing 2434, the attempt to bootstrap B will fail and will
 not work until A is done bootstrapping, unless you set the
 property -Dcassandra.consistent.rangemovement=false while starting it.
 In general, one DOES NOT WANT TO
 SET -Dcassandra.consistent.rangemovement! It fixes 2434, and 2434 is
 bad for consistency.
 Instead, considering expanding clusters to initial size when they are
 empty, and disabling bootstrapping while doing so.
 Lots and lots of background on :
 https://issues.apache.org/jira/browse/CASSANDRA-2434
 Related ticket : https://issues.apache.org/jira/browse/CASSANDRA-7069
 =Rob
 PS - BTW, the fact that 2434 existed for so long, in versions where repair
 was often broken/unused, is the strongest single item of information in
 support of the Coli Conjecture...

Question about nodetool status ... output

2015-06-12 Thread Jens Rantil

Hi,

I have one node in my 5-node cluster that effectively owns 100% and it
looks like my cluster is rather imbalanced. Is it common to have it this
imbalanced for 4-5 nodes?

My current output for a keyspace is:

$ nodetool status myks
Datacenter: Cassandra
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns (effective)  Host ID
Rack
UN  X.X.X.33  203.92 GB  256 41.3%
871968c9-1d6b-4f06-ba90-8b3a8d92dcf0  RAC1
UN  X.X.X.32  200.44 GB  256 34.2%
d7cacd89-8613-4de5-8a5e-a2c53c41ea45  RAC1
UN  X.X.X.51  197.17 GB  256 100.0%
 344b0adf-2b5d-47c8-8881-9a3f56be6f3b  RAC1
UN  X.X.X.52  113.63 GB  1   46.3%
55daa807-af49-44c5-9742-fe456df621a1  RAC1
UN  X.X.X.31  204.49 GB  256 78.3%
48cb0782-6c9a-4805-9330-38e192b6b680  RAC1

My keyspace has RF=3 and originally I added X.X.X.52 (num_tokens=1 was a
mistake) and then X.X.X.51. I haven't executed `nodetool cleanup` on any
nodes yet.

For the curious, the full ring can be found here:
https://gist.github.com/JensRantil/57ee515e647e2f154779

Cheers,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Question regarding concurrent bootstrapping

2015-06-12 Thread Jens Rantil

Hi,

Let's say I have an existing cluster and do the following:

   1. I start a new joining node (A). It enters state Up/Joining.
   Streaming automatically start to this node.
   2. I wait two minutes (best practise for bootstrapping).
   3. I start a second node (B) to join the cluster. It allocates some of
   A:s previous parts of the ring and enters state Up/Joining. Streaming
   automatically starts to this node.

Will streaming of data that A is no longer responsible (after B joined)
stop immediately? That is, after (3), will data streamed to A only be what
it is responsible of?

This is of importance for planning when one it expanding a cluster to
multiple smaller nodes.

Thanks,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Question about nodetool status ... output

2015-06-12 Thread Jens Rantil

Hi Carlos,

Yes, I should have been more specific about that; basically all my primary
ID:s are random UUIDs so I find that very hard to believe that my data
model should be the problem here. I will run a full repair of the cluster,
execute a cleanup and recommission the node, then.

Thanks,
Jens

On Fri, Jun 12, 2015 at 2:38 PM, Carlos Rolo r...@pythian.com wrote:

 Your data model also contributes to the balance (or lack of) of the
 cluster. If you have a really bad data partitioning Cassandra will not do
 any magic.

 Regarding that cluster, I would decommission the x.52 node and add it
 again with the correct configuration. After the bootstrap, run a cleanup.
 If is still that off-balance, you need to look into your data model.

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
 www.pythian.com

 On Fri, Jun 12, 2015 at 11:58 AM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,

 I have one node in my 5-node cluster that effectively owns 100% and it
 looks like my cluster is rather imbalanced. Is it common to have it this
 imbalanced for 4-5 nodes?

 My current output for a keyspace is:

 $ nodetool status myks
 Datacenter: Cassandra
 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID
   Rack
 UN  X.X.X.33  203.92 GB  256 41.3%
 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0  RAC1
 UN  X.X.X.32  200.44 GB  256 34.2%
 d7cacd89-8613-4de5-8a5e-a2c53c41ea45  RAC1
 UN  X.X.X.51  197.17 GB  256 100.0%
  344b0adf-2b5d-47c8-8881-9a3f56be6f3b  RAC1
 UN  X.X.X.52  113.63 GB  1   46.3%
 55daa807-af49-44c5-9742-fe456df621a1  RAC1
 UN  X.X.X.31  204.49 GB  256 78.3%
 48cb0782-6c9a-4805-9330-38e192b6b680  RAC1

 My keyspace has RF=3 and originally I added X.X.X.52 (num_tokens=1 was a
 mistake) and then X.X.X.51. I haven't executed `nodetool cleanup` on any
 nodes yet.

 For the curious, the full ring can be found here:
 https://gist.github.com/JensRantil/57ee515e647e2f154779

 Cheers,
 Jens

 --
 Jens Rantil
 Backend engineer
 Tink AB

 Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32
 Web: www.tink.se

 Facebook https://www.facebook.com/#!/tink.se Linkedin
 http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
  Twitter https://twitter.com/tink



 --






-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Hbase vs Cassandra

2015-06-08 Thread Jens Rantil

8) Hbase can do range scans, and one can attack many problems with range
scans. Cassandra can't do range scans.

9) HBase is a distributed, consistent, sorted key value store. The
sorted bit allows for range scans in addition to the point gets that all
K/V stores support. Nothing more, nothing less.
It happens to store its data in HDFS by default, and we provide convenient
input and output formats for map reduce.

*Neutral:*
1)
http://khangaonkar.blogspot.com/2013/09/cassandra-vs-hbase-which-nosql-store-do.html

2) The fundamental differences that come to mind are:
* HBase is always consistent. Machine outages lead to inability to read or
write data on that machine. With Cassandra you can always write.

* Cassandra defaults to a random partitioner, so range scans are not
possible (by default)
* HBase has a range partitioner (if you don't want that the client has to
prefix the rowkey with a prefix of a hash of the rowkey). The main feature
that set HBase apart are range scans.

* HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc.
You can map reduce directly into HFiles and map those into HBase instantly.

* Cassandra has a dedicated company supporting (and promoting) it.
* Getting started is easier with Cassandra. For HBase you need to run HDFS
and Zookeeper, etc.
* I've heard lots of anecdotes about Cassandra working nicely with small
cluster ( 50 nodes) and quick degenerating above that.
* HBase does not have a query language (but you can use Phoenix for full
SQL support)
* HBase does not have secondary indexes (having an eventually consistent
index, similar to what Cassandra has, is easy in HBase, but making it as
consistent as the rest of HBase is hard)

Thanks
Ajay

On May 29, 2015, at 12:09 PM, Ajay ajay.ga...@gmail.com wrote:

Hi,

I need some info on Hbase vs Cassandra as a data store (in general plus
specific to time series data).

The comparison in the following helps:
1: features
2: deployment and monitoring
3: performance
4: anything else

Thanks
Ajay

--
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
Twitter https://twitter.com/tink

Re: Hbase vs Cassandra

2015-06-08 Thread Jens Rantil

On Mon, Jun 8, 2015 at 11:16 AM, Ajay ajay.ga...@gmail.com wrote:

  If I understand correctly, you mean when we write with QUORUM and
 Cassandra writes to few machines and fails to write to few machines and
 throws exception if it doesn't satisfy QUORUM, leaving it inconsistent and
 doesn't rollback?.


Yes.

/Jens


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Decommission datacenter - repair?

2015-06-07 Thread Jens Rantil

Ah, that explains things. Thanks!

On Fri, Jun 5, 2015 at 10:59 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Jun 5, 2015 at 5:15 AM, Jens Rantil jens.ran...@tink.se wrote:

 Datastax's documentation on Decommissioning a data center
 http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_decomission_dc_t.html
 tells me to run a full repair and then decommission each node. Isn't
 decommissioning going to hand over all data anyway? Then why is the repair
 necessary?


 In step 3 of those instructions you reduce the number of replicas in the
 departing DC to 0.

 The departing DC no longer owns ranges at this point, and no longer is
 responsible for replicas.

 It therefore does no streaming (except maybe hints?) when you decommission
 nodes.

 =Rob





-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Newly added node getting more data than expected

2015-06-07 Thread Jens Rantil

Hi again,

I should also point out that `nodetool ring ...` only has one entry for
X.X.X.4 and that that token range is equally large as the other token
ranges for the virtual nodes.

Let me know if you need any more information from me.

Cheers,
Jens

On Sun, Jun 7, 2015 at 11:19 PM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,

 I had a 3-node (à 256 vnodes each) cluster with RF=3. I mistakenly added a
 fourth node with num_tokens: 1 (that is, one vnode). I've always seen
 number of vnodes to be proportional to the amount of data a node would
 receive. Therefor, I was expecting the node to receive something like
 1/(1+3*256) of the cluster's data. However, this is not the case:

 $ nodetool status mydatacenter
 Datacenter: Cassandra
 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID
 Rack
 UN  X.X.X.2  200.42 GB  256 87.6%
 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0  RAC1
 UN  X.X.X.3  198.03 GB  256 53.7%
 d7cacd89-8613-4de5-8a5e-a2c53c41ea45  RAC1
 UN  X.X.X.4  110.57 GB  1   58.7%
 55daa807-af49-44c5-9742-fe456df621a1  RAC1
 UN  X.X.X.5  199.81 GB  256 100.0%
  48cb0782-6c9a-4805-9330-38e192b6b680  RAC1

 The new node added is X.X.X.4. Note that I haven't executed `nodetool
 cleanup` on the old nodes yet.

 Additional information:
  * I am using GossipingPropertyFileSnitch. All nodes are the same
 datacenter and rack.
  * There are no pending compactions on the node.

 Could anyone explain to me my new node is receiving more data than
 expected? Does this have to do with the way the GossipingPropertyFileSnitch
 decides where to put secondary/tertiary replicas (ie. always next physical
 node in ring)? Do I need to execute `nodetool cleanup` also on newly
 commissioned nodes?

 Thanks,
 Jens

 --
 Jens Rantil
 Backend engineer
 Tink AB

 Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32
 Web: www.tink.se

 Facebook https://www.facebook.com/#!/tink.se Linkedin
 http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
  Twitter https://twitter.com/tink




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Newly added node getting more data than expected

2015-06-07 Thread Jens Rantil

Hi,

I had a 3-node (à 256 vnodes each) cluster with RF=3. I mistakenly added a
fourth node with num_tokens: 1 (that is, one vnode). I've always seen
number of vnodes to be proportional to the amount of data a node would
receive. Therefor, I was expecting the node to receive something like
1/(1+3*256) of the cluster's data. However, this is not the case:

$ nodetool status mydatacenter
Datacenter: Cassandra
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns (effective)  Host ID
Rack
UN  X.X.X.2  200.42 GB  256 87.6%
871968c9-1d6b-4f06-ba90-8b3a8d92dcf0  RAC1
UN  X.X.X.3  198.03 GB  256 53.7%
d7cacd89-8613-4de5-8a5e-a2c53c41ea45  RAC1
UN  X.X.X.4  110.57 GB  1   58.7%
55daa807-af49-44c5-9742-fe456df621a1  RAC1
UN  X.X.X.5  199.81 GB  256 100.0%
 48cb0782-6c9a-4805-9330-38e192b6b680  RAC1

The new node added is X.X.X.4. Note that I haven't executed `nodetool
cleanup` on the old nodes yet.

Additional information:
 * I am using GossipingPropertyFileSnitch. All nodes are the same
datacenter and rack.
 * There are no pending compactions on the node.

Could anyone explain to me my new node is receiving more data than
expected? Does this have to do with the way the GossipingPropertyFileSnitch
decides where to put secondary/tertiary replicas (ie. always next physical
node in ring)? Do I need to execute `nodetool cleanup` also on newly
commissioned nodes?

Thanks,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Decommission datacenter - repair?

2015-06-05 Thread Jens Rantil

Hi,

I asked this on IRC earlier today, but didn't get any response;

Datastax's documentation on Decommissioning a data center
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_decomission_dc_t.html
tells me to run a full repair and then decommission each node. Isn't
decommissioning going to hand over all data anyway? Then why is the repair
necessary?

Cheers,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Decommission datacenter - repair?

2015-06-05 Thread Jens Rantil

Hi Kiran,

So, am I understanding you correctly that a decommissioning node only will
hand over its data to a single node? If it would hand it over to all other
replica nodes, I see that essentially as an implicit repair. Am I wrong?

Thanks,
Jens

On Fri, Jun 5, 2015 at 2:27 PM, Kiran mk coolkiran2...@gmail.com wrote:

 Hi Jens,

 If you decommission a data center,  The data residing in the Data Center
 which you are planning for decommission has to be balanced to the nodes of
 the other data center satisfying RF.  Hence Repair is required.

 Best Regards,
 Kiran.M.K.

 On Fri, Jun 5, 2015 at 5:45 PM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,

 I asked this on IRC earlier today, but didn't get any response;

 Datastax's documentation on Decommissioning a data center
 http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_decomission_dc_t.html
 tells me to run a full repair and then decommission each node. Isn't
 decommissioning going to hand over all data anyway? Then why is the repair
 necessary?

 Cheers,
 Jens

 --
 Jens Rantil
 Backend engineer
 Tink AB

 Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32
 Web: www.tink.se

 Facebook https://www.facebook.com/#!/tink.se Linkedin
 http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
  Twitter https://twitter.com/tink




 --
 Best Regards,
 Kiran.M.K.




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Concurrent schema creation/change strategy

2015-05-26 Thread Jens Rantil

Hi,

Generally it can take a couple of seconds before a schema change has
propagated to all nodes. The schema will in most cases converge, but as far
as I've understood, concurrent schema changes are considered a bad practise
and can lead to inconsistent schemas down the road. IIRC if one executes
all schema changes from the same node one can be certain that the schema
changes will converge. After executing a schema change you can execute
`nodetool describecluster` to make sure all nodes have the same schema.

What I'd suggest is that you either

- introduce a queue to execute the schema changes from a single node; or
- come up with the schema that works generically over time; or
- somehow introduce a global lock if you are to programatically alter
schema. Lock, make change, poll until all nodes have the same schema,
release lock.

Those are mu 5 cents. There are probably other solutions.

Cheers,
Jens

On Mon, May 25, 2015 at 10:58 AM, Magnus Vojbacke
magnus.vojba...@digitalroute.com wrote:

I have a lot of clients that will try to create the same schema (a
keyspace with multiple tables) concurrently during application startup. The
idea is that the first time the application starts, the clients will create
the schema needed to run (create if not exists, etc...)

From what I’ve read, I think that Cassandra has support for concurrent
schema creation and modification, but I assume there will be conflicts of
some sort.

Is there any known strategy for handling this? Specifically considering
conflicts.

In case of a conflict (e.g., two clients trying to create the exact same
table), will the client call return with an error? (Datastax driver)

Would a plausible strategy be (for each client) 1) try to create the
table, 2) examine any error coming back to determine if a conflict
happened, 3) if conflict, move on to next table?

Or is it just better to add a separate step to create the schema at some
point in time before the clients can be allowed to work (i.e. move schema
creation out of the clients)?

Thanks
/Magnus

--
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Re: Performance penalty of multiple UPDATEs of non-pk columns

2015-05-21 Thread Jens Rantil

Artur,

That's not entirely true. Writes to Cassandra are first written to a
memtable (in-memory table) which is periodically flushed to disk. If
multiple writes are coming in before the flush, then only a single record
will be written to the disk/sstable. If your have writes that aren't coming
within the same flush, they will get removed when you are compacting just
like you say.

Unfortunately I can't answer this regarding Counters as I haven't worked
with them.

Hope this helped at least.

Cheers,
Jens

On Fri, May 15, 2015 at 11:16 AM, Artur Siekielski a...@vhex.net wrote:

 I've seen some discussions about the topic on the list recently, but I
 would like to get more clear answers.

 Given the table:

 CREATE TABLE t1 (
 f1 text,
 f2 text,
 f3 text,
 PRIMARY KEY(f1, f2)
 );

 and assuming I will execute UPDATE of f3 multiple times (say, 1000) for
 the same key values k1, k2 and different values of 'newval':

 UPDATE t1 SET f3=newval WHERE f1=k1 AND f2=k2;

 How will the performance of selecting the current 'f3' value be affected?:

 SELECT f3 FROM t1 WHERE f1=k2 AND f2=k2;

 It looks like all the previous values are preserved until compaction, but
 does executing the SELECT reads all the values (O(n), n - number of
 updates) or only the current one (O(1)) ?


 How the situation looks for Counter types?




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Hive support on Cassandra

2015-05-07 Thread Jens Rantil

Hi Ajay,

I just Googled your question and ended up here:
http://stackoverflow.com/q/11850186/260805 The only solution seem to be
Datastax Enterprise.

Cheers,
Jens

On Wed, May 6, 2015 at 7:57 AM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 Does Apache Cassandra (not DSE) support Hive Integration?

 I found couple of open source efforts but nothing is available currently.

 Thanks
 Ajay




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Query returning tombstones

2015-05-02 Thread Jens Rantil

Hi Christian,

I just know Sylvain explicitly stated he wasn't a fan of exposing
tombstones here:
https://issues.apache.org/jira/browse/CASSANDRA-8574?focusedCommentId=14292063page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14292063

Cheers,
Jens

On Wed, Apr 29, 2015 at 12:43 PM, horschi hors...@gmail.com wrote:

 Hi,

 did anybody ever raise a feature request for selecting tombstones in
 CQL/thrift?

 It would be nice if I could use CQLSH to see where my tombstones are
 coming from. This would much more convenient than using sstable2json.

 Maybe someone can point me to an existing jira-ticket, but I also
 appreciate any other feedback :-)

 regards,
 Christian




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: When to use STCS/DTCS/LCS

2015-04-09 Thread Jens Rantil

Divya,

Please start a new thread for that. Or is your question related
specifically to this thread?

Thanks,
Jens

On Thu, Apr 9, 2015 at 11:34 AM, Divya Divs divya.divi2...@gmail.com
wrote:

 hi sir..
  I'm a m-tech student. my academic project is under cassandra. I have run
 the source code of cassandra in eclipse juno using ant build.
 https://github.com/apache/cassandra. i have to do some feature
 enhancement in cassandra and i have analyze my application in cassandra. So
 please tell me what kind of feature enhancementthat i can do in cassandra.
 tell me a simple feature enhancement thats enough.Please guide me. Thanks
 in advance.

 Thanks and Regards,
 Divya





-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Help understanding aftermath of death by GC

2015-03-31 Thread Jens Rantil

Hi Robert,

On Tue, Mar 31, 2015 at 2:22 PM, Robert Wille rwi...@fold3.com wrote:

 Can anybody help me understand why Cassandra wouldn’t recover?


One issue when you are running a JVM and start running out of memory is
that the JVM can start throwing `OutOfMemoryError` in any thread - not
necessarily in the thread which is taking all the memory. I've seen this
happen multiple times. If this happened to you, a critical Cassandra thread
could have died and brought the whole Cassandra DB with itself.

Just an idea - cheers,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Really high read latency

2015-03-23 Thread Jens Rantil

Also, two control questions:

   - Are you using EBS for data storage? It might introduce additional
   latencies.
   - Are you doing proper paging when querying the keyspace?

Cheers,
Jens

On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith david92galbra...@gmail.com
wrote:

 Hi! So I've got a table like this:

 CREATE TABLE default.metrics (row_time int,attrs varchar,offset
 int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
 STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
 comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
 index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
 AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
 speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
 compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
 AND compression={'sstable_compression':'LZ4Compressor'};

 and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB
 of heap space. So it's timeseries data that I'm doing so I increment
 row_time each day, attrs is additional identifying information about
 each series, and offset is the number of milliseconds into the day for
 each data point. So for the past 5 days, I've been inserting 3k
 points/second distributed across 100k distinct attrses. And now when I
 try to run queries on this data that look like

 SELECT * FROM default.metrics WHERE row_time = 5 AND attrs =
 'potatoes_and_jam'

 it takes an absurdly long time and sometimes just times out. I did
 nodetool cftsats default and here's what I get:

 Keyspace: default
 Read Count: 59
 Read Latency: 397.12523728813557 ms.
 Write Count: 155128
 Write Latency: 0.3675690719921613 ms.
 Pending Flushes: 0
 Table: metrics
 SSTable count: 26
 Space used (live): 35146349027
 Space used (total): 35146349027
 Space used by snapshots (total): 0
 SSTable Compression Ratio: 0.10386468749216264
 Memtable cell count: 141800
 Memtable data size: 31071290
 Memtable switch count: 41
 Local read count: 59
 Local read latency: 397.126 ms
 Local write count: 155128
 Local write latency: 0.368 ms
 Pending flushes: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 2856
 Compacted partition minimum bytes: 104
 Compacted partition maximum bytes: 36904729268
 Compacted partition mean bytes: 986530969
 Average live cells per slice (last five minutes):
 501.66101694915255
 Maximum live cells per slice (last five minutes): 502.0
 Average tombstones per slice (last five minutes): 0.0
 Maximum tombstones per slice (last five minutes): 0.0

 Ouch! 400ms of read latency, orders of magnitude higher than it has any
 right to be. How could this have happened? Is there something fundamentally
 broken about my data model? Thanks!




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Store data with cassandra

2015-03-20 Thread Jens Rantil

Jean,

I'm not sure you will receive any reply unless you ask specific questions about
those links.

Cheers,
Jens

–
Skickat från Mailbox

On Fri, Mar 20, 2015 at 5:08 PM, Sibbald, Charles
charles.sibb...@bskyb.com wrote:

Sounds like this is a job for jackrabbit ?
http://jackrabbit.apache.org
From: Ali Akhtar ali.rac...@gmail.commailto:ali.rac...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, 20 March 2015 15:58
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Store data with cassandra
( I apologize, I'm only joking. To answer your question, Cassandra tends to
cache the first 300MB or so of data in memory, only when it grows beyond that
does it start to write it to files. But, Cassandra is not the write choice
for storing files. In the screenshot you linked, its only storing the
filenames, not the actual contents of the files).
On Fri, Mar 20, 2015 at 8:54 PM, Ali Akhtar
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
It has been decided that the file cannot be allowed to be stored, sorry.
However, if a sacrifice to the gods is prepared, it may be possible to change
things.
On Fri, Mar 20, 2015 at 8:49 PM, jean paul
researche...@gmail.commailto:researche...@gmail.com wrote:
i'd like to store MyFile.txt using cassandra (replicat = 2) and see on what
node the file and its replicas are stored on my cluster of 10 nodes
it is a simple file with simple content (text)
is that possible ?
2015-03-20 16:44 GMT+01:00 Ali Akhtar
ali.rac...@gmail.commailto:ali.rac...@gmail.com:
The files you store have to personally be vetted by the cassandra community.
Only if they're found to not contain anything inappropriate, does cassandra
let you store them. (A 3/4 majority vote is necessary).
Please send your files for approval to
j...@reallycereal.commailto:j...@reallycereal.com
On Fri, Mar 20, 2015 at 8:41 PM, jean paul
researche...@gmail.commailto:researche...@gmail.com wrote:
What about this so
http://www.datastax.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-10-at-11.21.55-AM.png
i read also some documents about storing blob with cassandra !!
2015-03-20 15:04 GMT+01:00 Michael Dykman
mdyk...@gmail.commailto:mdyk...@gmail.com:
You seem to be missing the point here.
Cassandra does not manage files, it manages data in a highly distributed
cluster. If you are attempting to manage files, you are quite simply using
the wrong tool and Cassandra is not for you.
On Fri, Mar 20, 2015 at 9:10 AM, jean paul
researche...@gmail.commailto:researche...@gmail.com wrote:
I have used this tutoriel to create my data base
http://planetcassandra.org/insert-select-records/
/var/lib/cassandra/data# ls
demo system system_traces
:/var/lib/cassandra/data# cd demo/
:/var/lib/cassandra/data/demo# ls
users
:/var/lib/cassandra/data/demo# cd users/
:/var/lib/cassandra/data/demo/users# ls
:/var/lib/cassandra/data/demo/users#
i find nothing in /var/lib/cassandra/data/demo/users!
2015-03-20 13:06 GMT+01:00 jean paul
researche...@gmail.commailto:researche...@gmail.com:
Hello All;
Please,
i have created this table.
lastname | age | city | email | firstname
--+-+---+-+--- Doe
| 36 | Beverly Hills | jane...@email.commailto:jane...@email.com |
Jane Byrne | 24 | San Diego |
robby...@email.commailto:robby...@email.com | Rob Smith | 46 |
Sacramento | johnsm...@email.commailto:johnsm...@email.com | John
So, my question, where this data is saved ? in ./var/lib/cassandra/data ?
My end goal is to store afile with cassandra and to see on which node my file
is stored ?
thanks a lot for help
Best Regards.
--
- michael dykman
- mdyk...@gmail.commailto:mdyk...@gmail.com
May the Source be with you.
Information in this email including any attachments may be privileged,
confidential and is intended exclusively for the addressee. The views
expressed may not be official policy, but the personal views of the
originator. If you have received it in error, please notify the sender by
return e-mail and delete it from your system. You should not reproduce,
distribute, store, retransmit, use or disclose its contents to anyone. Please
note we reserve the right to monitor all e-mail communication through our
internal and external networks. SKY and the SKY marks are trademarks of Sky
plc and Sky International AG and are used under licence. Sky UK Limited
(Registration No. 2906991), Sky-In-Home Service Limited (Registration No.
2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are
direct or indirect subsidiaries of Sky plc (Registration No. 2247735). All of
the companies mentioned in this paragraph are incorporated in

Re: Timeout error in fetching million rows as results using clustering keys

2015-03-18 Thread Jens Rantil

Hi,

Try setting fetchsize before querying. Assuming you don't set it too high, and 
you don't have too many tombstones, that should do it.

Cheers,
Jens



–
Skickat från Mailbox

On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta meme...@cs.stonybrook.edu
wrote:

 Hi,
 I have requirement to fetch million row as result of my query which is
 giving timeout errors.
 I am fetching results by selecting clustering columns, then why the queries
 are taking so long. I can change the timeout settings but I need the data
 to fetched faster as per my requirement.
 My table definition is:
 *CREATE TABLE images.results (uuid uuid, analysis_execution_id varchar,
 analysis_execution_uuid uuid, x  double, y double, loc varchar, w double, h
 double, normalized varchar, type varchar, filehost varchar, filename
 varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
 image_mpp_x double, image_mpp_y double, image_width double, image_height
 double, objective double, cancer_type varchar,  Area float, submit_date
 timestamp, points listdouble,  PRIMARY KEY ((image_caseid),Area,uuid));*
 Here each row is uniquely identified on the basis of unique uuid. But since
 my data is generally queried based upon *image_caseid *I have made it
 partition key.
 I am currently using Java Datastax api to fetch the results. But the query
 is taking a lot of time resulting in timeout errors:
 Exception in thread main
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
 at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
 at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
 at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
 at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
 at QueryDB.queryArea(TestQuery.java:59)
 at TestQuery.main(TestQuery.java:35)
 Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
 All host(s) tried for query failed (tried: localhost/127.0.0.1:9042
 (com.datastax.driver.core.exceptions.DriverException: Timed out waiting for
 server response))
 at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
 at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Also when I try the same query on console even while using limit of 2000
 rows:
 cqlsh:images select count(*) from results where
 image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area100 and Area20 limit 2000;
 errors={}, last_host=127.0.0.1
 Thanks and Regards,
 Mehak

Re: Inconsistent count(*) and distinct results from Cassandra

2015-03-04 Thread Jens Rantil

Frens,

What consistency are you querying with? Could be you are simply receiving 
result from different nodes each time.

Jens



–
Skickat från Mailbox

On Wed, Mar 4, 2015 at 7:08 PM, Mikhail Strebkov streb...@gmail.com
wrote:

 We have observed the same issue in our production Cassandra cluster (5 nodes 
 in one DC). We use Cassandra 2.1.3 (I joined the list too late to realize we 
 shouldn’t user 2.1.x yet) on Amazon machines (created from community AMI).
 In addition to count variations with 5 to 10% we observe variations for the 
 query “select * from table1 where time  '$fromDate' and time  '$toDate' 
 allow filtering” results. We iterated through the results multiple times 
 using official Java driver. We used that query for a huge data migration and 
 were unpleasantly surprised that it is unreliable. In our case “nodetool 
 repair” didn’t fix the issue.
 So I echo Frens questions.
 Thanks,
 Mikhail
 On Wed, Mar 4, 2015 at 3:55 AM, Rumph, Frens Jan m...@frensjan.nl wrote:
 Hi,
 Is it to be expected that select count(*) from ... and select distinct
 partition-key-columns from ... to yield inconsistent results between
 executions even though the table at hand isn't written to?
 I have a table in a keyspace with replication_factor = 1 which is something
 like:
 CREATE TABLE tbl (
 id frozenid_type,
 bucket bigint,
 offset int,
 value double,
 PRIMARY KEY ((id, bucket), offset)
 )
 The frozen udt is:
 CREATE TYPE id_type (
 tags maptext, text
 );
 When I do select count(*) from tbl several times the actual count varies
 with 5 to 10%. Also when performing select distinct id, bucket from tbl the
 results aren't consistent over several query executions. The table is not
 being written to at the time I performed the queries.
 Is this to be expected? Or is this a bug? Is there a alternative method /
 workaround?
 I'm using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with Oracle
 Java 1.8.0_31.
 Thanks in advance,
 Frens Jan

Re: Input/Output Error

2015-03-04 Thread Jens Rantil

Hi,

Check your Cassandra and kernel (if on Linux) log files for errors.

Cheers,
Jens



–
Skickat från Mailbox

On Wed, Mar 4, 2015 at 2:18 AM, 曹志富 cao.zh...@gmail.com wrote:

 Some times My C* 2.1.3 cluster compaction or streaming occur this error ,do
 this because of disk or filesystem problem??
 Thanks All.
 --
 Ranger Tsao

Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread Jens Rantil

Hi,

I have not done something similar, however I have some comments:

On Mon, Mar 2, 2015 at 8:47 PM, Clint Kelly clint.ke...@gmail.com wrote:

The downside of this approach is that we can no longer do a simple
continuous scan to get all of the events for a given user.

Sure, but would you really do that real time anyway? :) If you have
billions of events that's not going to scale anyway. Also, if you have
10 events per bucket. The latency introduced by batching should be
manageable.

Some users may log lots and lots of interactions every day, while others
may interact with our application infrequently,

This makes another reason to split them up into bucket to make the cluster
partitions more manageble and homogenous.

so I'd like a quick way to get the most recent interaction for a given
user.

For this you could actually have a second table that stores the
last_time_bucket for a user. Upon event write, you could simply do an
update of the last_time_bucket. You could even have an index of all time
buckets per user if you want.

Has anyone used different approaches for this problem?

The only thing I can think of is to use the second table schema described
above, but switch to an order-preserving hashing function, and then
manually hash the id field. This is essentially what we would do in
HBase.

Like you might already know, this order preserving hashing is _not_
considered best practise in the Cassandra world.

Cheers,
Jens

--
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Re: using or in select query in cassandra

2015-03-02 Thread Jens Rantil

Hi Rahul,

No, you can't do this in a single query. You will need to execute two
separate queries if the requirements are on different columns. However, if
you'd like to select multiple rows of with restriction on the same column
you can do that using the `IN` construct:

select * from table where id IN (123,124);

See [1] for reference.

[1]
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

Cheers,
Jens

On Mon, Mar 2, 2015 at 7:06 AM, Rahul Srivastava 
srivastava.robi...@gmail.com wrote:

 Hi
  I want to make uniqueness for my data so i need to add OR clause  in my
 WHERE clause.
 ex: select * from table where id =123 OR name ='abc'
 so in above i want that i get data if my id is 123 or my name is abc .

 is there any possibility in cassandra to achieve this .




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: How to extract all the user id from a single table in Cassandra?

2015-03-02 Thread Jens Rantil

Hi Check,

Please avoid double posting on mailing lists. It leads to double work
(respect people's time!) and makes it hard for people in the future having
the same issue as you to follow discussions and answers.

That said, if you have a lot of primary keys

select user_id from testkeyspace.user_record;

will most definitely timeout. Have a look at `SELECT DISTINCT` at [1]. More
importantly, for larger datasets you will also need to split the token
space into smaller segments and iteratively select your primary keys. See
[2].

[1]
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html
[2]
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__paging-through-unordered-results

If you are having specific issues with the Java Driver I suggest you ask on
that mailing list (only).

Cheers,
Jens

On Sun, Mar 1, 2015 at 6:38 PM, Check Peck comptechge...@gmail.com wrote:

 Sending again as I didn't got any response on this.

 Any thoughts?

 On Fri, Feb 27, 2015 at 8:24 PM, Check Peck comptechge...@gmail.com
 wrote:

 I have a Cassandra table like this -

 create table user_record (user_id text, record_name text,
 record_value blob, primary key (user_id, record_name));

 What is the best way to extract all the user_id from this table? As of
 now, I cannot change my data model to do this exercise so I need to find a
 way by which I can extract all the user_id from the above table.

 I am using Datastax Java driver in my project. Is there any other easy
 way apart from code to extract all the user_id from the above table through
 come cqlsh utility and dump it into some file?

 I am thinking below code might timed out after some time -

 public class TestCassandra {

 private Session session = null;
 private Cluster cluster = null;

 private static class ConnectionHolder {
 static final TestCassandra connection = new
 TestCassandra();
 }

 public static TestCassandra getInstance() {
 return ConnectionHolder.connection;
 }

 private TestCassandra() {
 Builder builder = Cluster.builder();
 builder.addContactPoints(127.0.0.1);

 PoolingOptions opts = new PoolingOptions();
 opts.setCoreConnectionsPerHost(HostDistance.LOCAL,
 opts.getCoreConnectionsPerHost(HostDistance.LOCAL));

 cluster =
 builder.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE).withPoolingOptions(opts)
 .withLoadBalancingPolicy(new TokenAwarePolicy(new
 DCAwareRoundRobinPolicy(PI)))
 .withReconnectionPolicy(new
 ConstantReconnectionPolicy(100L))
 .build();
 session = cluster.connect();
 }

 private SetString getRandomUsers() {
 SetString userList = new HashSetString();

 String sql = select user_id from testkeyspace.user_record;;

 try {
 SimpleStatement query = new SimpleStatement(sql);
 query.setConsistencyLevel(ConsistencyLevel.ONE);
 ResultSet res = session.execute(query);

 IteratorRow rows = res.iterator();
 while (rows.hasNext()) {
 Row r = rows.next();

 String user_id = r.getString(user_id);
 userList.add(user_id);
 }
 } catch (Exception e) {
 System.out.println(error=  + e);
 }

 return userList;
 }
 }

 Adding java-driver group and Cassandra group as well to see whether there
 is any better way to execute this?





-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: how many rows can one partion key hold?

2015-02-27 Thread Jens Rantil

Also, note that repairs will be slower for larger rows and AFAIK also
require slightly more memory. Also, to avoid many tombstones it could be
worth to consider bucketing your partitions by time.

Cheers,
Jens

On Fri, Feb 27, 2015 at 7:44 AM, wateray wate...@163.com wrote:

 Hi all,
 My team is using Cassandra as our database. We have one question as below.
 As we know, the row with the some partition key will be stored in the some
 node.
 But how many rows can one partition key hold? What is it depend on? The
 node's volume or partition data size or partition rows size(the number of
 rows)?
  When one partition's data is  extreme large, the write/read will slow?
 Can anyone show me some exist usecases.
  thanks!








-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Added new nodes to cluster but no streams

2015-02-13 Thread Jens Rantil

Hi Bastranut,

A few minutes between each node will do.

Cheers,
Jens

On Fri, Feb 13, 2015 at 1:12 PM, Batranut Bogdan batra...@yahoo.com wrote:

Hello,

When adding a new node to the cluster I need to wait for each node to
receive all the data from other nodes in the cluster or just wait a few
minutes before I start each node?

On Thursday, February 12, 2015 7:21 PM, Robert Coli
rc...@eventbrite.com wrote:

On Thu, Feb 12, 2015 at 3:20 AM, Batranut Bogdan batra...@yahoo.com
wrote:

I have added new nodes to the existing cluster. In Opscenter I do not see
any streams... I presume that the new nodes get the data from the rest of
the cluster via streams. The existing cluster has TB magnitude, and space
used in the new nodes is ~90 GB. I must admit that I have restarted the new
nodes several times after adding them . Does this affect boostrap? AFAIK
the new nodes should start loading a part of all the data in the existing
cluster.

If it stays like this for a while, it sounds like your bootstraps have
hung. Note that in general you should add nodes one at a time, especially
if you are in a version without the fix for CASSANDRA-2434, in theory
adding multiple nodes at once might contribute to their bootstraps
hanging.

Stop cassandra on the joining nodes, wipe/move aside their data
directories, and try again one at a time.

=Rob

--
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Re: How to speed up SELECT * query in Cassandra

2015-02-13 Thread Jens Rantil

If you are using Spark you need to be _really_ careful about your
tombstones. In our experience a single partition with too many tombstones
can take down the whole batch job (until something like
https://issues.apache.org/jira/browse/CASSANDRA-8574 is fixed). This was a
major obstacle for us to overcome when using Spark.

Cheers,
Jens

On Wed, Feb 11, 2015 at 5:12 PM, Jiri Horky ho...@avast.com wrote:

  Well, I always wondered how Cassandra can by used in Hadoop-like
 environment where you basically need to do full table scan.

 I need to say that our experience is that cassandra is perfect for
 writing, reading specific values by key, but definitely not for reading all
 of the data out of it. Some of our projects found out that doing that with
 a not trivial in a timely manner is close to impossible in many situations.
 We are slowly moving to storing the data in HDFS and possibly reprocess
 them on a daily bases for such usecases (statistics).

 This is nothing against Cassandra, it can not be perfect for everything.
 But I am really interested how it can work well with Spark/Hadoop where you
 basically needs to read all the data as well (as far as I understand that).

 Jirka H.


 On 02/11/2015 01:51 PM, DuyHai Doan wrote:

 The very nature of cassandra's distributed nature vs partitioning data
 on hadoop makes spark on hdfs actually fasted than on cassandra

  Prove it. Did you ever have a look into the source code of the
 Spark/Cassandra connector to see how data locality is achieved before
 throwing out such statement ?

 On Wed, Feb 11, 2015 at 12:42 PM, Marcelo Valle (BLOOMBERG/ LONDON) 
 mvallemil...@bloomberg.net wrote:

   cassandra makes a very poor datawarehouse ot long term time series
 store

  Really? This is not the impression I have... I think Cassandra is good
 to store larges amounts of data and historical information, it's only not
 good to store temporary data.
 Netflix has a large amount of data and it's all stored in Cassandra,
 AFAIK.

   The very nature of cassandra's distributed nature vs partitioning
 data on hadoop makes spark on hdfs actually fasted than on cassandra.

  I am not sure about the current state of Spark support for Cassandra,
 but I guess if you create a map reduce job, the intermediate map results
 will be still stored in HDFS, as it happens to hadoop, is this right? I
 think the problem with Spark + Cassandra or with Hadoop + Cassandra is that
 the hard part spark or hadoop does, the shuffling, could be done out of the
 box with Cassandra, but no one takes advantage on that. What if a map /
 reduce job used a temporary CF in Cassandra to store intermediate results?

   From: user@cassandra.apache.org
 Subject: Re: How to speed up SELECT * query in Cassandra

 I use spark with cassandra, and you dont need DSE.

  I see a lot of people ask this same question below (how do I get a lot
 of data out of cassandra?), and my question is always, why arent you
 updating both places at once?

  For example, we use hadoop and cassandra in conjunction with each
 other, we use a message bus to store every event in both, aggregrate in
 both, but only keep current data in cassandra (cassandra makes a very poor
 datawarehouse ot long term time series store) and then use services to
 process queries that merge data from hadoop and cassandra.

  Also, spark on hdfs gives more flexibility in terms of large datasets
 and performance.  The very nature of cassandra's distributed nature vs
 partitioning data on hadoop makes spark on hdfs actually fasted than on
 cassandra



 --
 *Colin Clark*
  +1 612 859 6129
 Skype colin.p.clark

 On Feb 11, 2015, at 4:49 AM, Jens Rantil jens.ran...@tink.se wrote:


 On Wed, Feb 11, 2015 at 11:40 AM, Marcelo Valle (BLOOMBERG/ LONDON) 
 mvallemil...@bloomberg.net wrote:

 If you use Cassandra enterprise, you can use hive, AFAIK.


 Even better, you can use Spark/Shark with DSE.

  Cheers,
 Jens


  --
  Jens Rantil
 Backend engineer
 Tink AB

  Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32
 Web: www.tink.se

  Facebook https://www.facebook.com/#%21/tink.se Linkedin
 http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
  Twitter https://twitter.com/tink







-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: How to speed up SELECT * query in Cassandra

2015-02-11 Thread Jens Rantil

On Wed, Feb 11, 2015 at 11:40 AM, Marcelo Valle (BLOOMBERG/ LONDON) 
mvallemil...@bloomberg.net wrote:

 If you use Cassandra enterprise, you can use hive, AFAIK.


Even better, you can use Spark/Shark with DSE.

Cheers,
Jens


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Writing the same column frequently - anti pattern?

2015-02-06 Thread Jens Rantil

Hi,

If the writes are coming from the same machine, you could potentially
use request
collapsing
https://github.com/Netflix/Hystrix/wiki/How-To-Use#request-collapsing to
avoid the duplicate writes.

Just an idea,
Jens

On Fri, Feb 6, 2015 at 1:15 AM, Andreas Finke andreas.fi...@solvians.com
wrote:

  Hi,

  we are currently writing the same column within a row multiple times (up
 to 10 times a second). I am familiar with the concept of tombstones in
 SSTables. My question is: I assume that in our case in most cases when a
 column gets overwritten it still resides in the memtable. So I assume for
 that particular case no tombstone is set but the column is replaced in
 memory and then the 'newest' version is flushed to disk.

  Is this assumption correct? Or Is writing the same column an an
 anti-pattern?

  I am thankful for any input.

  Regards
 Andi




-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: how to batch the select query to reduce network communication

2015-02-06 Thread Jens Rantil

As an alternative, you could always execute the async to Cassandra and then
iterate over the results as they come in.

Cheers,
Jens

On Fri, Feb 6, 2015 at 12:39 PM, Carlos Rolo r...@pythian.com wrote:

 Hi,

 You can't. Batches are only available for INSERT, UPDATE and DELETE
 operations. Batches exist to give Cassandra some atomicity, as in, or all
 operations succeed or all fail.

 Regards,

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Tel: 1649
 www.pythian.com

 On Fri, Feb 6, 2015 at 12:21 PM, diwayou diwa...@vip.qq.com wrote:

 create table t {
 a int,
 b int,
 c int
 }
 if i want to execute
 select * from t where a = 1 and b = 2 limit 10;
 select * from t where a = 1 and b = 3 limit 10;

 how can i batch this, and only execute once to get the result



 --






-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Controlling the MAX SIZE of sstables after compaction

2015-01-26 Thread Jens Rantil

Parth,





 So are you saying that I should query cassandra right away?




Well, don’t take my word for it, but it definitely sounds like a more simple 
approach.




 If yes, like I mentioned, I have to run this during traffic hours. Isnt there 
 a possibility then that my traffic to the db may get impacted?




Absolutely, it could. But so will converting your sstables to JSON. But a 
database is also made to be read from ;) I suggest you set up a test cluster 
and try the load impact before you try other ways (such as dumping database 
etc.). If load is too high you could also incorporate some kind of rate 
limiting and/or concurrency limit on your report generation.




I also know that people have succesfully used Spark or similar infrastructure 
for batch processing of Cassandra data. Not sure, but could be useful for you 
to look into.




 also is it okay to use hector to this?




I have no personal experience with Hector, but I suppose so.




Cheers,

Jens




———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Mon, Jan 26, 2015 at 9:57 AM, Parth Setya setya.pa...@gmail.com
wrote:

 hey Jens
 Thank you so much for the advise and reading through.
 So are you saying that I should query cassandra right away?
 If yes, like I mentioned, I have to run this during traffic hours. Isnt
 there a possibility then that my traffic to the db may get impacted?
 also is it okay to use hector to this?
 Best
 On Mon, Jan 26, 2015 at 2:19 PM, Jens Rantil jens.ran...@tink.se wrote:
 Hi Parth,

 I’ll take your questions in order:

 1. Have a look at the compaction subproperties for STCS:
 http://datastax.com/documentation/cql/3.1/cql/cql_reference/compactSubprop.html

 2. Why not talk to Cassandra when generating the report? It will be waaay
 faster (and easier!); Cassandra will use bloom filters, handle shadowed
 (overwritten) columns, handle tombstones for you, not the mention the fact
 that it uses sstables that are hot in OS file cache.

 3. See 2) above. Also, your approach requires you to implement handling of
 shadowed columns as well as tombstone handling which could be pretty messy.

 Cheers,
 Jens

 ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter


 On Mon, Jan 26, 2015 at 7:40 AM, Parth Setya setya.pa...@gmail.com
 wrote:

  Hi


 *Setup*

 *3 Node Cluster*
 Api-
 * Hector*CL-
 * QUORUM*
 RF-
 *3*
 Compaction Strategy-
 *Size Tiered Compaction*

 *Use Case*
 I have about *320 million rows*(~12 to 15 columns each) worth of data
 stored in Cassandra. In order to generate a report containing ALL that
 data, I do the following:
 1. Run Compaction
 2. Take a snapshot of the db
 3. Run sstable2json on all the *Data.db files
 4. Read those jsons and write to a csv.

  *Problem*:
 The *sstable2json* utility takes about 350-400 hours (~85% of the total
 time) thereby lengthening the process. (I am running sstable2json
 sequentially on all the *Data.db files but the size of those is
 inconsistent so making it run concurrently doesn't help either E.G one file
 is of size 25 GB while another of 500 MB)

  *My Thought Process:*
 Is there a way to put a cap on the maximum size of the sstables that are
 generated after compaction such that i have multiple sstables of uniform
 size. Then I can run sstable2json utility on the same concurrently

  *Questions:*
 1. Is there a way to configure the size of sstables created after
 compaction?
 2. Is there a better approach to generate the report?
 3. What are the flaws with this approach?

 Best
 Parth

Re: Controlling the MAX SIZE of sstables after compaction

2015-01-26 Thread Jens Rantil

Hi Parth,




I’ll take your questions in order:




1. Have a look at the compaction subproperties for STCS: 
http://datastax.com/documentation/cql/3.1/cql/cql_reference/compactSubprop.html


2. Why not talk to Cassandra when generating the report? It will be waaay 
faster (and easier!); Cassandra will use bloom filters, handle shadowed 
(overwritten) columns, handle tombstones for you, not the mention the fact that 
it uses sstables that are hot in OS file cache.




3. See 2) above. Also, your approach requires you to implement handling of 
shadowed columns as well as tombstone handling which could be pretty messy.




Cheers,

Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Mon, Jan 26, 2015 at 7:40 AM, Parth Setya setya.pa...@gmail.com
wrote:

 Hi
 *Setup*
 *3 Node Cluster*
 Api-
 * Hector*CL-
 * QUORUM*
 RF-
 *3*
 Compaction Strategy-
 *Size Tiered Compaction*
 *Use Case*
 I have about *320 million rows*(~12 to 15 columns each) worth of data
 stored in Cassandra. In order to generate a report containing ALL that
 data, I do the following:
 1. Run Compaction
 2. Take a snapshot of the db
 3. Run sstable2json on all the *Data.db files
 4. Read those jsons and write to a csv.
 *Problem*:
 The *sstable2json* utility takes about 350-400 hours (~85% of the total
 time) thereby lengthening the process. (I am running sstable2json
 sequentially on all the *Data.db files but the size of those is
 inconsistent so making it run concurrently doesn't help either E.G one file
 is of size 25 GB while another of 500 MB)
 *My Thought Process:*
 Is there a way to put a cap on the maximum size of the sstables that are
 generated after compaction such that i have multiple sstables of uniform
 size. Then I can run sstable2json utility on the same concurrently
 *Questions:*
 1. Is there a way to configure the size of sstables created after
 compaction?
 2. Is there a better approach to generate the report?
 3. What are the flaws with this approach?
 Best
 Parth

Re: How to know disk utilization by each row on a node

2015-01-20 Thread Jens Rantil

Hi,

Datastax comes with sstablekeys that does that. You could also use sstable2json 
script to find keys.

Cheers,
Jens

On Tue, Jan 20, 2015 at 2:53 PM, Edson Marquezani Filho
edsonmarquez...@gmail.com wrote:

 Hello, everybody.
 Does anyone know a way to list, for an arbitrary column family, all
 the rows owned (including replicas) by a given node and the data size
 (real size or disk occupation) of each one of them on that node?
 I would like to do that because I have data on one of my nodes growing
 faster than the others, although rows (and replicas) seem evenly
 distributed across the cluster. So, I would like to verify if I have
 some specific rows growing too much.
 Thank you.

Re: keyspace not exists?

2015-01-16 Thread Jens Rantil

Hi Jason,

Have you checked the Cassandra log?

Cheers,
Jens

On Fri, Jan 16, 2015 at 10:59 AM, Jason Wee peich...@gmail.com wrote:

 $ cqlsh 192.168.0.2 9042
 Connected to just4fun at 192.168.0.2:9042.
 [cqlsh 5.0.1 | Cassandra 2.1.1 | CQL spec 3.2.0 | Native protocol v3]
 Use HELP for help.
 cqlsh DESCRIBE KEYSPACES
 empty
 cqlsh create keyspace foobar with replication = {'class':'SimpleStrategy',
 'replication_factor':3};
 errors={}, last_host=192.168.0.2
 cqlsh DESCRIBE KEYSPACES;
 empty
 cqlsh use foobar;
 cqlsh:foobar  DESCRIBE TABLES;
 Keyspace 'foobar' not found.
 Just trying cassandra 2.1 and encounter the above erorr, can anyone explain
 why is this and where to even begin troubleshooting?
 Jason

Script to count tombstones by partition key

2015-01-14 Thread Jens Rantil

Hi all


I just recently put together a small script to count the number of tombstones 
grouped by partition id, for one or multiple sstables:


https://gist.github.com/JensRantil/063b7c56ca4a8dfe1c50



I needed this for debugging purposes and thought I’d share it with you guys in 
case anyone is interested.


Cheers,
Jens

———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

TombstoneOverwhelmingException for few tombstones

2015-01-07 Thread Jens Rantil

Hi,


I have a single partition key that been nagging me because I am receiving 
org.apache.cassandra.db.filter.TombstoneOverwhelmingException. After filing 
https://issues.apache.org/jira/browse/CASSANDRA-8561 I managed to find the 
partition key in question and which machine it was located on (by looking in 
system.log). Since I wanted to see how many tombstones the partition key 
actually had I did:


    nodetool flush mykeyspace mytable


to make sure all changes were written to sstables (not sure this was 
necessary), then


    nodetool getsstables mykeyspace mytable PARTITIONKEY


which listed two sstables. I then had a look at both sstables for my key in 
question using


    sstable2json MYSSTABLE1 -k PARTITIONKEY | jq .  MYSSTABLE1.json
    sstable2json MYSSTABLE2 -k PARTITIONKEY | jq .  MYSSTABLE2.json



(piping through jq to format the json). Both JSON files contains data (so I 
have selected the right key). Only one of the files contains any tombstones


$ cat MYSSTABLE1.json | grep 't'|wc -l
    4281
$ cat MYSSTABLE2.json | grep 't'|wc -l
       0



But to my surprise, the number of tombstones are nowhere near


tombstone_failure_threshold: 10


Can anyone explain why Cassandra is overwhelmed when I’m nowhere near the hard 
limit?


Thanks,
Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

Re: Implications of ramping up max_hint_window_in_ms

2015-01-05 Thread Jens Rantil

Thanks for input, Rob. Just making sure, is older version the same as less 
than version 2?

On Mon, Jan 5, 2015 at 8:13 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Jan 5, 2015 at 2:52 AM, Jens Rantil jens.ran...@tink.se wrote:
 Since repair is a slow and daunting process*, I am considering increasing
 max_hint_window_in_ms from its default value of one (1) hour to something
 like 24-48 hours.
 ...
 Are there any other implications of making this change that I haven’t
 thought of?

 Not really, though 24-48 hours of hints could be an awful lot of hints. I
 personally run with at least a 6 hour max_h_w_i_m.
 In older versions of Cassandra, 24-48 hours of hints could hose your node
 via ineffective constant compaction.
 =Rob

Implications of ramping up max_hint_window_in_ms

2015-01-05 Thread Jens Rantil

Hi,


Since repair is a slow and daunting process*, I am considering increasing 
max_hint_window_in_ms from its default value of one (1) hour to something like 
24-48 hours. This will give me and my team more time to fix the underlying 
problem of a node. I understand that
 - repair is the only way to avoid hardware failure/bit rot scenarios. I will 
still be running repair on a weekly basis.
 - disk usage obviously will increase before data has been handed off. Disk 
usage shouldn’t be an issue in this case.


Are there any other implications of making this change that I haven’t thought 
of?


* I know incremental repair is coming up, but I don’t consider it stable enough.


Thanks,
Jens

———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

Re: is primary key( foo, bar) the same as primary key ( foo ) with a ‘set' of bars?

2015-01-01 Thread Jens Rantil

...they have a somewhat different conflict/repair resolutions, too.

On Thu, Jan 1, 2015 at 8:06 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Storage-engine wise, they are almost equivalent, thought there are some
 minor differences:
 1) with Set structure, you cannot store more that 64kb worth of data
 2) collections and maps are loaded entirely by Cassandra for each query,
 whereas with clustering columns you can select a slice of columns
 On Thu, Jan 1, 2015 at 7:46 PM, Kevin Burton bur...@spinn3r.com wrote:
 I think the two tables are the same.  Correct?

 create table foo (

 source text,
 target text,
 primary key( source, target )
 )


 vs

 create table foo (

 source text,
 target settext,
 primary key( source )
 )

 … meaning that the first one, under the covers is represented the same as
 the second.  As a slice.

 Am I correct?

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com

How many tombstones for deleted CQL row?

2014-12-26 Thread Jens Rantil

Hi,

I am considering tuning the tombstone warn/error threshold.

Just making sure; If I INSERT one (CQL) row populating all six columns and
then DELETE the inserted row, will Cassandra write 1 range tombstone or
seven tombstones (one per columns plus row marker)?

Thanks,
Jens

Re: How many tombstones for deleted CQL row?

2014-12-26 Thread Jens Rantil

Great. Also, if I issue DELETE my_table WHERE partition_key=xxx AND 
compound_key=yyy I understand only a single tombstone will be created?

On Fri, Dec 26, 2014 at 10:59 AM, DuyHai Doan doanduy...@gmail.com
wrote:

 If you issue DELETE my_table WHERE partition_key = xxx Cassandra will
 create a row tomstone and not one tombstone per column, fortunately
 On Fri, Dec 26, 2014 at 10:50 AM, Jens Rantil jens.ran...@tink.se wrote:
 Hi,

 I am considering tuning the tombstone warn/error threshold.

 Just making sure; If I INSERT one (CQL) row populating all six columns and
 then DELETE the inserted row, will Cassandra write 1 range tombstone or
 seven tombstones (one per columns plus row marker)?

 Thanks,
 Jens

Re: Sqoop Free Form Import Query Breaks off

2014-12-25 Thread Jens Rantil

Hi,

Does this have anything to do with Cassandra? Also, please try to avoid cross 
posting; It makes it hard for
- future readers to read the full thread.
- anyone to follow the full thread.
- anyone to respond. I assume there are few who are enrolled to both mailing 
lists at the same time.

Thank you and merry Christmas,
Jens

On Thu, Dec 25, 2014 at 2:24 PM, Vineet Mishra clearmido...@gmail.com
wrote:

 Hi All,
 I am facing a issue while Sqoop(Sqoop version: 1.4.3-cdh4.7.0) Import, I am
 having a Java threaded code to import data from multiple databases running
 at different servers.
 Currently I am doing a Java Process Execute something like to execute sqoop
 job,
 Runtime.getRuntime().exec(/usr/bin/sqoop import --driver
 com.vertica.jdbc.Driver  --connect jdbc:vertica://host:port/db  --username
 user --password pwd --query 'select * from table WHERE $CONDITIONS'
 --split-by id --target-dir /user/hive/warehouse/data/db.db/table
 --fields-terminated-by '\t' --hive-drop-import-delims -m 1)
 I am executing the above command as it is and running into exception saying,
 WARN tool.BaseSqoopTool: Setting your password on the command-line is
 insecure. Consider using -P instead.
 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Error parsing arguments for
 import:
 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: *
 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: from
 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: table
 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: WHERE
 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument:
 $CONDITIONS'
 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument:
 --split-by
 14/12/25 18:38:29 ERROR tool.BaseSqoopTool: Unrecognized argument: id
 .
 .
 .
 Although I can easily understand the error and reason of the fact as sqoop
 is internally splitting the command by space and taking the KV which is
 splitting the free form query as otherwise its runs fine with the table
 parameter instead, but if I run the same command directly from the command
 line it works like a charm.
 wanted to know is there's something that I am missing while going this way,
 If no,
 then why is this issue hitting and what's the work around?
 Urgent Call!
 Thanks!

Re: Multi DC informations (sync)

2014-12-19 Thread Jens Rantil

Alain,




AFAIK, the DC replication is not linearizable. That is, writes are are not 
replicated according to a binlog or similar like MySQL. They are replicated 
concurrently.




To answer you questions:

1 - Replication lag in Cassandra terms is probably “Hinted handoff”. You’d want 
to check the status of that.

2 - `nodetool status` is your friend. It will tell you whether the cluster 
considers other nodes reachable or not. Run it on a node in the datacenter that 
you’d like to test connectivity from.




Cheers,

Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Fri, Dec 19, 2014 at 11:16 AM, Alain RODRIGUEZ arodr...@gmail.com
wrote:

 Hi guys,
 We expanded our cluster to a multiple DC configuration.
 Now I am wondering if there is any way to know:
 1 - The replication lag between these 2 DC (Opscenter, nodetool, other ?)
 2 - Make sure that sync is ok at any time
 I guess big companies running Cassandra are interested in these kind of
 info, so I think something exist but I am not aware of it.
 Any other important information or advice you can give me about best
 practices or tricks while running a multi DC (cross regions US - EU) is
 welcome of course !
 cheers,
 Alain

Re: Understanding tombstone WARN log output

2014-12-19 Thread Jens Rantil

Hi again,




A follow-up question (to my yet unanswered question):



How come the first localDeletion is Integer.MAX_VALUE above? Should it be?




Cheers,

Jens






———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Thu, Dec 18, 2014 at 2:48 PM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,
 I am occasionally seeing:
  WARN [ReadStage:9576] 2014-12-18 11:16:19,042 SliceQueryFilter.java (line
 225) Read 756 live and 17027 tombstoned cells in mykeyspace.mytable (see
 tombstone_warn_threshold). 5001 columns was requested,
 slices=[73c31274-f45c-4ba5-884a-6d08d20597e7:myfield-],
 delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647,
 ranges=[73f0b59e-7525-4a18-a84f-d2a2f0505503-73f0b59e-7525-4a18-a84f-d2a2f0505503:!,
 deletedAt=141872018676,
 localDeletion=1418720186][74374d72-2688-4e64-bb0b-f51a956b0529-74374d72-2688-4e64-bb0b-f51a956b0529:!,
 deletedAt=1418720184675000, localDeletion=1418720184] ...
 in system.log. My primary key is ((userid uuid), id uuid). Is it possible
 for me to see from this output which partition key and/or ranges that has
 all of these tombstones?
 Thanks,
 Jens
 -- 
 Jens Rantil
 Backend engineer
 Tink AB
 Email: jens.ran...@tink.se
 Phone: +46 708 84 18 32
 Web: www.tink.se
 Facebook https://www.facebook.com/#!/tink.se Linkedin
 http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
  Twitter https://twitter.com/tink

Understanding tombstone WARN log output

2014-12-18 Thread Jens Rantil

Hi,

I am occasionally seeing:

 WARN [ReadStage:9576] 2014-12-18 11:16:19,042 SliceQueryFilter.java (line
225) Read 756 live and 17027 tombstoned cells in mykeyspace.mytable (see
tombstone_warn_threshold). 5001 columns was requested,
slices=[73c31274-f45c-4ba5-884a-6d08d20597e7:myfield-],
delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647,
ranges=[73f0b59e-7525-4a18-a84f-d2a2f0505503-73f0b59e-7525-4a18-a84f-d2a2f0505503:!,
deletedAt=141872018676,
localDeletion=1418720186][74374d72-2688-4e64-bb0b-f51a956b0529-74374d72-2688-4e64-bb0b-f51a956b0529:!,
deletedAt=1418720184675000, localDeletion=1418720184] ...

in system.log. My primary key is ((userid uuid), id uuid). Is it possible
for me to see from this output which partition key and/or ranges that has
all of these tombstones?

Thanks,
Jens

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Replacing nodes disks

2014-12-18 Thread Jens Rantil

Hi Or,

You don't have another machine on the network that would temporarily be
able to host your /var/lib/cassandra content? That way you would simply be
scp:ing the files temporarily to another machine and copy them back when
done. You obviously want to do a repair afterwards just in case, but this
could save you some time.

Just an idea,
Jens

On Thu, Dec 18, 2014 at 4:17 PM, Or Sher or.sh...@gmail.com wrote:

Hi all,

We have a situation where some of our nodes have smaller disks and we
would like to align all nodes by replacing the smaller disks to bigger ones
without replacing nodes.
We don't have enough space to put data on / disk and copy it back to the
bigger disks so we would like to rebuild the nodes data from other replicas.

What do you think should be the procedure here?

I'm guessing it should be something like this but I'm pretty sure it's not
enough.
1. shutdown C* node and server.
2. replace disks + create the same vg lv etc.
3. start C* (Normally?)
4. nodetool repair/rebuild?
*I think I might get some consistency issues for use cases relying on
Quorum reads and writes for strong consistency.
What do you say?

Another question is (and I know it depends on many factors but I'd like to
hear an experienced estimation): How much time would take to rebuild a 250G
data node?

Thanks in advance,
Or.

--
Or Sher

--
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Query strategy with respect to tombstones

2014-12-17 Thread Jens Rantil

Hi,


I have a table with composite primary id ((userid), id). Some patterns about my 
table:
 * Each user generally has 0-3000 rows. But there is currently no upper limit.
 * Deleting rows for a user is extremely rare, but when done it can be done 
thousands of rows at a time.
 * The absolutely most common query is to select all rows for a user.


Recently I saw a user that previously had 65000 tombstones when querying for 
all his rows. system.log was printing TombstoneOverwhelmingException.


What are my options to avoid this overwhelming tombstone exception? I am 
willing to have slower queries than actually not being able to query at all. I 
see a couple of options:
 * Using an anti-column to mark rows as deleted. I could then control the rate 
of which I am writing tombstones by occasionally deleting anti-columns/rows 
with their equivalent rows.
 * Simply raise tombstone_failure_threshold. AFAIK, this will eventually make 
me run into possible GC issues.
 * Use fetchSize to limit the number of rows paged through. This would make 
every single query slower, and would not entirely avoid the possibility of 
getting TombstoneOverwhelmingException.


Have I missed any alternatives here?


 In the best of worlds, the fetchSize property would also honour the number of 
tombstones, but I don’t think that would be possible, right?


Thanks,
Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

Re: Understanding what is key and partition key

2014-12-16 Thread Jens Rantil

For the first row, the key is: (2014, N, 1, සියළුම, යුද්ධ) and the value-part 
is (664).




Cheers,

Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Dec 16, 2014 at 2:25 PM, Chamila Wijayarathna
cdwijayarat...@gmail.com wrote:

 Hi Jack,
 So what will be the keys and values of the following CF instance?
 year | category | frequency | word1| word2   | id
 --+--+---+--+-+---
  2014 |N | 1 |සියළුම | යුද්ධ |   664
  2014 |N | 1 |එච් |   කාණ්ඩය | 12526
  2014 |N | 1 |ගජබා | සුපර්ක්‍රොස් | 25779
  2014 |N | 1 |  බී|   කාණ්ඩය | 12505
 Thank You!
 On Tue, Dec 16, 2014 at 6:45 PM, Jack Krupansky j...@basetechnology.com
 wrote:

   Correction: year and category form a “composite partition key”.

 frequency, word1, and word2 are “clustering columns”.

 The combination of a partition key with clustering columns is a “compound
 primary key”.

 Every CQL row will have a partition key by definition, and may optionally
 have clustering columns.

 “The key” should just be a synonym for “primary key”, although sometimes
 people are loosely speaking about “the partition” (which should be “the
 partition key”) rather than the CQL “row”.

 -- Jack Krupansky

  *From:* Chamila Wijayarathna cdwijayarat...@gmail.com
 *Sent:* Tuesday, December 16, 2014 8:03 AM
 *To:* user@cassandra.apache.org
 *Subject:* Understanding what is key and partition key

  Hello all,

 I have read a lot about Cassandra and I read about key-value pairs,
 partition keys, clustering keys, etc..
 Is key mentioned in key-value pair and partition key refers to same or are
 they different?


 CREATE TABLE corpus.bigram_time_category_ordered_frequency (
 id bigint,
 word1 varchar,
 word2 varchar,
 year int,
 category varchar,
 frequency int,
 PRIMARY KEY((year, category),frequency,word1,word2));


 In this schema, I know (year, category) is the compound partition key and
 frequency is the clustering key. What is the key here?


 Thank You!

 --
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.

 -- 
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Jens Rantil

Maybe checking which thread(s) would hint what's going on? (see 
http://www.boxjar.com/using-top-and-jstack-to-find-the-java-thread-that-is-hogging-the-cpu/).

On Wed, Dec 17, 2014 at 1:51 AM, Arne Claassen a...@emotient.com wrote:

 Cassandra 2.0.10 and Datastax Java Driver 2.1.1
 On Dec 16, 2014, at 4:48 PM, Ryan Svihla rsvi...@datastax.com wrote:
 What version of Cassandra?
 
 On Dec 16, 2014 6:36 PM, Arne Claassen a...@emotient.com wrote:
 That's just the thing. There is nothing in the logs except the constant 
 ParNew collections like
 
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) 
 GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888
 
 But the load is staying continuously high.
 
 There's always some compaction on just that one table, media_tracks_raw 
 going on and those values rarely changed (certainly the remaining time is 
 meaningless)
 
 pending tasks: 17
   compaction typekeyspace   table   completed
total  unit  progress
Compaction   mediamedia_tracks_raw   444294932
   1310653468 bytes33.90%
Compaction   mediamedia_tracks_raw   131931354
   3411631999 bytes 3.87%
Compaction   mediamedia_tracks_raw30308970
  23097672194 bytes 0.13%
Compaction   mediamedia_tracks_raw   899216961
   1815591081 bytes49.53%
 Active compaction remaining time :   0h27m56s
 
 Here's a sample of a query trace:
 
  activity
  | timestamp| source| source_elapsed
 --+--+---+
  
   execute_cql3_query | 00:11:46,612 | 10.140.22.236 |  0
  Parsing select * from media_tracks_raw where id 
 =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 
 10.140.22.236 | 47
  
  Preparing statement | 00:11:46,612 | 10.140.22.236 |234
  Sending 
 message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |   7190
  Message 
 received from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 | 12
  Executing single-partition 
 query on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |  21971
  
 Acquiring sstable references | 00:11:46,644 |  10.140.21.54 |  22029
   
 Merging memtable tombstones | 00:11:46,644 |  10.140.21.54 |  22131
 Bloom filter allows 
 skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |  22245
 Bloom filter allows 
 skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |  22279
 Bloom filter allows 
 skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |  22293
 Bloom filter allows 
 skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |  22304
 Bloom filter allows 
 skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |  22317
 Bloom filter allows 
 skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |  22328
 Bloom filter allows 
 skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |  22340
 Bloom filter allows 
 skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |  22352
 Bloom filter allows 
 skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |  22363
 Bloom filter allows 
 skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |  22374
 Bloom filter allows 
 skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |  22386
 Bloom filter allows 
 skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |  22397
 Bloom filter allows 
 skipping sstable 1334 | 00:11:46,644 |

Re: Hinted handoff not working

2014-12-14 Thread Jens Rantil

Hi Robert ,

Maybe you need to flush your memtables to actually see the disk usage increase? 
This applies to both hosts.

Cheers,
Jens

On Sun, Dec 14, 2014 at 3:52 PM, Robert Wille rwi...@fold3.com wrote:

 I have a cluster with RF=3. If I shut down one node, add a bunch of data to 
 the cluster, I don’t see a bunch of records added to system.hints. Also, du 
 of /var/lib/cassandra/data/system/hints of the nodes that are up shows that 
 hints aren’t being stored. When I start the down node, its data doesn’t grow 
 until I run repair, which then takes a really long time because it is 
 significantly out of date. Is there some magic setting I cannot find in the 
 documentation to enable hinted handoff? I’m running 2.0.11. Any insights 
 would be greatly appreciated. 
 Thanks
 Robert

`nodetool cfhistogram` utility script

2014-12-12 Thread Jens Rantil

Hi,


I just quickly put together a tiny utility script to estimate 
average/mean/min/max/percentiles for `nodetool cfhistogram` latency output. 
Maybe could be useful to someone else, don’t know. You can find it here:


https://gist.github.com/JensRantil/3da67e39f50aaf4f5bce



Future improvements would obviously be to not hardcode `us:` and support the 
other histograms. Also, this logic should maybe even be moved into the  
`nodetool cfhistogram` since these are fairly common metrics for latency.


Cheers,
Jens

———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

Re: batch_size_warn_threshold_in_kb

2014-12-11 Thread Jens Rantil

Maybe slightly off-topic, but what is a mutation? Is it equivalent to a CQL 
row? Or maybe a column in a row? Does include tombstones within the selected 
range?

Thanks,
Jens

On Thu, Dec 11, 2014 at 9:56 PM, Ryan Svihla rsvi...@datastax.com wrote:

 Nothing magic, just put in there based on experience. You can find the
 story behind the original recommendation here
 https://issues.apache.org/jira/browse/CASSANDRA-6487
 Key reasoning for the desire comes from Patrick McFadden:
 Yes that was in bytes. Just in my own experience, I don't recommend more
 than ~100 mutations per batch. Doing some quick math I came up with 5k as
 100 x 50 byte mutations.
 Totally up for debate.
 It's totally changeable, however, it's there in no small part because so
 many people confuse the BATCH keyword as a performance optimization, this
 helps flag those cases of misuse.
 On Thu, Dec 11, 2014 at 2:43 PM, Mohammed Guller moham...@glassbeam.com
 wrote:

   Hi –

 The cassandra.yaml file has property called *batch_size_warn_threshold_in_kb.
 *

 The default size is 5kb and according to the comments in the yaml file, it
 is used to log WARN on any batch size exceeding this value in kilobytes. It
 says caution should be taken on increasing the size of this threshold as it
 can lead to node instability.



 Does anybody know the significance of this magic number 5kb? Why would a
 higher number (say 10kb) lead to node instability?



 Mohammed

 -- 
 [image: datastax_logo.png] http://www.datastax.com/
 Ryan Svihla
 Solution Architect
 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/
 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: Best practice for emulating a Cassandra timeout during unit tests?

2014-12-10 Thread Jens Rantil

Hi,




I don’t know if this is “best practice”, but you could do this using mocking if 
nothing else.




Cheers,

Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Dec 9, 2014 at 8:42 PM, Clint Kelly clint.ke...@gmail.com wrote:

 Hi all,
 I'd like to write some tests for my code that uses the Cassandra Java
 driver to see how it behaves if there is a read timeout while accessing
 Cassandra.  Is there a best-practice for getting this done?  I was thinking
 about adjusting the settings in the cluster builder to adjust the timeout
 settings to be something impossibly low (like 1ms), but I'd rather do
 something to my test Cassandra instance (using the
 EmbeddedCassandraService) to temporarily slow it down.
 Any suggestions?
 Best regards,
 Clint

Re: Cassandra backup via snapshots in production

2014-12-01 Thread Jens Rantil

On Mon, Dec 1, 2014 at 8:39 PM, Robert Coli rc...@eventbrite.com wrote:

 Why not use the much more robustly designed and maintained community based
 project, tablesnap?


For two reasons:

   - Because I am tired of the deployment model of Python apps which
   require me to set up virtual environments.
   - Because it did, AFAIK, not support (asymmetric) encryption before
   uploading.

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook https://www.facebook.com/#!/tink.se Linkedin
http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
 Twitter https://twitter.com/tink

Re: Cassandra add a node and remove a node

2014-11-30 Thread Jens Rantil

Hi Neha,




Generally best practice is to add the new node before removing the old one. 
This is especially important if the cluster’s resources (such as available disk 
space) are low. Also, adding a node usually asserts that the node is 
functioning correctly (check logs) before decommisioning the old node. See [1].




[1] 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_live_node.html




Cheers,

Jens




———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Mon, Dec 1, 2014 at 7:15 AM, Neha Trivedi nehajtriv...@gmail.com
wrote:

 Hi,
 I need to Add new Node and remove existing node.
 Should I first remove the node and then add a new node or Add new node and
 then remove existing node.
 Which practice is better and things I need to take care?
 regards
 Neha

Re: Cassandra backup via snapshots in production

2014-11-27 Thread Jens Rantil

Late answer; You can find my backup script here: 
https://gist.github.com/JensRantil/a8150e998250edfcd1a3


Basically you need to set S3_BUCKET, PGP_KEY_RECIPIENT, configure s3cmd (using 
s3cmd --configure) and then issue `./backup-keyspace.sh your-keyspace` to 
backup it to S3. We run the script is run periodically on every node.




Regarding “s3cmd --configure”, I executed it once and then copied “~/.s3cfg” to 
all nodes.




Like I said, there’s lots of love that can be put into a backup system. Note 
that the script has the following limitations:

 * It does not checksum the files. However s3cmd website states that it by 
default compares MD5 and file size on upload.

 * It does not do purging of files on S3 (which you could configure using 
“Object Lifecycles”).

 * It does not warn you that a backup fails. Check your logs periodically.

 * It does not do any advanced logging. Make sure to pipe the output to a file 
or the `syslog` utility.

 * It does not do continuous/point-in-time backup.




That said, it does its job for us for now.




Feel free to propose improvements!




Cheers,

Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Fri, Nov 21, 2014 at 7:36 PM, William Arbaugh w...@cs.umd.edu wrote:

 Jens,
 I'd be interested in seeing your script. We've been thinking of doing exactly 
 that but uploading to Glacier instead.
 Thanks, Bill
 On Nov 21, 2014, at 11:40 AM, Jens Rantil jens.ran...@tink.se wrote:
 
  The main purpose is to protect us from human errors (eg. unexpected 
  manipulations: delete, drop tables, …).
 
 If that is the main purpose, having auto_snapshot: true” in cassandra.yaml 
 will be enough to protect you.
 
 Regarding backup, I have a small script that creates a named snapshot and 
 for each sstable; encrypts, uploads to S3 and deletes the snapshotted 
 sstable. It took me an hour to write and roll out to all our nodes. The 
 whole process is currently logged, but eventually I will also send an e-mail 
 if backup fails.
 
 ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: 
 +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
 
 
 On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO ngocminh...@bnpparibas.com 
 wrote:
 
 Hello all,
 
 
 
 
  
 
 We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 
 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
 
 
 
 The main purpose is to protect us from human errors (eg. unexpected 
 manipulations: delete, drop tables, …).
 
 
 
 
  
 
 We are thinking of:
 
 
 
 -  Backup: add a 2TB HDD on each node for C* daily/weekly snapshots.
 
 
 
 -  Restore: load the most recent snapshots or latest “non-corrupted” 
 ones and replay missing data imports from other data source.
 
 
 
 
  
 
 We would like to know if somebody are using Cassandra’s backup feature in 
 production and could share your experience with us.
 
 
 
 
  
 
 Your help would be greatly appreciated.
 
 
 
 Best regards,
 
 
 
 Minh
 
 
 
 
 This message and any attachments (the message) is
 intended solely for the intended addressees and is confidential. 
 If you receive this message in error,or are not the intended recipient(s), 
 please delete it and any copies from your systems and immediately notify
 the sender. Any unauthorized view, use that does not comply with its 
 purpose, 
 dissemination or disclosure, either whole or partial, is prohibited. Since 
 the internet 
 cannot guarantee the integrity of this message which may not be reliable, 
 BNP PARIBAS 
 (and its subsidiaries) shall not be liable for the message if modified, 
 changed or falsified. 
 Do not print this message unless it is necessary,consider the environment.
 
 --
 
 Ce message et toutes les pieces jointes (ci-apres le message) 
 sont etablis a l'intention exclusive de ses destinataires et sont 
 confidentiels.
 Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
 merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
 immediatement l'expediteur. Toute lecture non autorisee, toute utilisation 
 de 
 ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
 publication, totale ou partielle, est interdite. L'Internet ne permettant 
 pas d'assurer
 l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
 (et ses filiales) decline(nt) toute responsabilite au titre de ce message 
 dans l'hypothese
 ou il aurait ete modifie, deforme ou falsifie. 
 N'imprimez ce message que si necessaire, pensez a l'environnement.

Re: Cassandra backup via snapshots in production

2014-11-25 Thread Jens Rantil

 Truncate does trigger snapshot creation though




Doesn’t it? With “auto_snapshot: true” it should.




———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan doanduy...@gmail.com wrote:

 True
 Delete in CQL just create tombstone so from the storage engine pov it's
 just adding some physical columns
 Truncate does trigger snapshot creation though
 Le 21 nov. 2014 19:29, Robert Coli rc...@eventbrite.com a écrit :
 On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil jens.ran...@tink.se wrote:

  The main purpose is to protect us from human errors (eg. unexpected
 manipulations: delete, drop tables, …).

 If that is the main purpose, having auto_snapshot: true” in
 cassandra.yaml will be enough to protect you.


 OP includes delete in their list of unexpected manipulations, and
 auto_snapshot: true will not protect you in any way from DELETE.

 =Rob
 http://twitter.com/rcolidba

Cassandra schema migrator

2014-11-25 Thread Jens Rantil

Hi,


Anyone who is using, or could recommend, a tool for versioning 
schemas/migrating in Cassandra? My list of requirements is:
 * Support for adding tables.
 * Support for versioning of table properties. All our tables are to be 
defaulted to LeveledCompactionStrategy.
 * Support for adding non-existing columns.
 * Optional: Support for removing columns.
 * Optional: Support for removing tables.


We are preferably a Java shop, but could potentially integrate something 
non-Java. I understand I could write a tool that would make these decisions 
using system.schema_columnfamilies and system.schema_columns, but as always 
reusing a proven tool would be preferable.


So far I only know of Spring Data Cassandra that handles creating tables and 
adding columns. However, it does not handle table properties in any way.


Thanks,
Jens

———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

Re: Problem with performance, memory consumption, and RLIMIT_MEMLOCK

2014-11-22 Thread Jens Rantil

Hi Dmitri,


I have not used the CPP driver, but maybe you have forgotten set the equivalent 
of the Iava driver's fetchsize to something sensible?




Just an idea,

Jens


—
Sent from Mailbox

On Sun, Nov 16, 2014 at 6:09 PM, Dmitri Dmitrienko ddmit...@gmail.com
wrote:

 Hi,
 I have a very simple table in cassandra that contains only three columns:
 id, time and blob with data. I added 1M rows of data and now the database
 is about 12GB on disk.
 1M is only part of data I want to store in the database, it's necessary to
 synchronize this table with external source. In order to do this, I have to
 read id and time columns of all the rows and compare them with what I see
 in the external source and insert/update/delete the rows where I see a
 difference.
 So, I'm trying to fetch id and time columns from cassandra. All of sudden
 in all 100% my attempts, server hangs for ~ 1minute, while doing so it
 loads 100% CPU, then abnormally terminates with error saying I have to run
 cassandra as root or increase RLIMIT_MEMLOCK.
 I increased RLIMIT_MEMLOCK to 1GB and seems it still is not sufficient.
 It seems cassandra tries to read and lock whole the table in memory,
 ignoring the fact that I need only two tiny columns (~12MB of data).
 This is how it works when I use the latest cpp-driver.
 With cqlsh it works differently -- it show first page of data almost
 immediately, without any sensible delay.
 Is there a way to have cpp-driver working like cqlsh? I'd like to have data
 sent to the client immediately upon availability without any attempts to
 lock huge chunks of virtual memory.
 My platform is 64bit linux (centos) with all necessary updates installed,
 openjdk. I also tried macosx with oracle jdk. In this case I don't get
 RLIMIT_MEMLOCK, but regular out of memory error in system.log, although I
 provided server with sufficiently large heap, as recommended, 8GB.

Re: Cassandra backup via snapshots in production

2014-11-21 Thread Jens Rantil

 The main purpose is to protect us from human errors (eg. unexpected 
 manipulations: delete, drop tables, …).




If that is the main purpose, having auto_snapshot: true” in cassandra.yaml 
will be enough to protect you.




Regarding backup, I have a small script that creates a named snapshot and for 
each sstable; encrypts, uploads to S3 and deletes the snapshotted sstable. It 
took me an hour to write and roll out to all our nodes. The whole process is 
currently logged, but eventually I will also send an e-mail if backup fails.


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO ngocminh...@bnpparibas.com
wrote:

 Hello all,
 We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 
 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
 The main purpose is to protect us from human errors (eg. unexpected 
 manipulations: delete, drop tables, …).
 We are thinking of:
 -  Backup: add a 2TB HDD on each node for C* daily/weekly snapshots.
 -  Restore: load the most recent snapshots or latest “non-corrupted” 
 ones and replay missing data imports from other data source.
 We would like to know if somebody are using Cassandra’s backup feature in 
 production and could share your experience with us.
 Your help would be greatly appreciated.
 Best regards,
 Minh
 This message and any attachments (the message) is
 intended solely for the intended addressees and is confidential. 
 If you receive this message in error,or are not the intended recipient(s), 
 please delete it and any copies from your systems and immediately notify
 the sender. Any unauthorized view, use that does not comply with its purpose, 
 dissemination or disclosure, either whole or partial, is prohibited. Since 
 the internet 
 cannot guarantee the integrity of this message which may not be reliable, BNP 
 PARIBAS 
 (and its subsidiaries) shall not be liable for the message if modified, 
 changed or falsified. 
 Do not print this message unless it is necessary,consider the environment.
 --
 Ce message et toutes les pieces jointes (ci-apres le message) 
 sont etablis a l'intention exclusive de ses destinataires et sont 
 confidentiels.
 Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
 merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
 immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
 ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
 publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
 d'assurer
 l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
 (et ses filiales) decline(nt) toute responsabilite au titre de ce message 
 dans l'hypothese
 ou il aurait ete modifie, deforme ou falsifie. 
 N'imprimez ce message que si necessaire, pensez a l'environnement.

1 2 >

1 - 100 of 141 matches

Mail list logo