from:"Ken Hancock"

RE: Version Upgrade

2018-05-03 Thread Ken Hancock

A related question:

Is nodetool upgradesstables only necessary before you’re going to move major 
versions?  I was under the impression that Cassandra N+1 could read the table 
format of Cassandra N.

In other words, if I am running Cassandra 1.2.x and upgrading to 2.0.x, 2.0.x 
will continue to read all the old Cassandra 1.2.x table. However, if I then 
want to upgrade to Cassandra 2.1.x, I’d better make sure all tables have been 
upgraded to 2.0.x before making the next upgrade.

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Wednesday, April 25, 2018 6:47 PM
To: user@cassandra.apache.org
Subject: Re: Version Upgrade

There's no harm in running it during any upgrade, and I always recommend doing 
it just to be in the habit.

My 2 cents.

On Wed, Apr 25, 2018 at 3:39 PM Christophe Schmitz 
> wrote:
Hi Pranay,

You only need to upgrade your SSTables when you perform a major Cassandra 
version upgrade, so you don't need to run it for upgrading in the 3.x.x series.
One way to check which storage version your SSTables are using is to look at 
the SSTables name. It is structured as: --.db 
The version is a string that represents the SSTable storage format version.
The version is "mc" in the 3.x.x series.

Cheers,
Christophe

On 26 April 2018 at 06:06, Pranay akula 
> wrote:
When is it necessary to upgrade SSTables ?? For a minor upgrade do we need to 
run upgrade stables??

I knew when we are doing a major upgrade we have to run upgrade sstables so 
that sstables will be re-written to newer version with additional meta data.

But do we need to run upgrade sstables for upgrading from let's say 3.0.15 to 
3.0.16 or 3.0.y to 3.11.y??

Thanks
Pranay

--

Christophe Schmitz - VP Consulting

AU: +61 4 03751980 / FR: +33 7 82022899

[https://drive.google.com/a/instaclustr.com/uc?id=1fKk3kS2ebaSVTQHd3dNEH6dkduSD4zGh=download]

[http://cdn2.hubspot.net/hubfs/184235/dev_images/signature_app/facebook_sig.png]

[http://cdn2.hubspot.net/hubfs/184235/dev_images/signature_app/twitter_sig.png] 

[http://cdn2.hubspot.net/hubfs/184235/dev_images/signature_app/linkedin_sig.png]

Read our latest technical blog posts here. 
This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and 
Instaclustr Inc (USA). This email and any attachments may contain confidential 
and legally privileged information.  If you are not the intended recipient, do 
not copy or disclose its content, but please reply to this email immediately 
and highlight the error to the sender and then immediately delete the message.

Re: how to start a embed cassandra instance?

2016-07-13 Thread Ken Hancock

Do either cassandra-unit or Achilles fork Cassandra to a separate JVM?
Guava libraries create a dependency hell with our current use of Hector's
embedded server.  We're starting to migrate to the Datastax Java driver
with yet another guava version.  I know Farsandra supports forking, so that
was where I was thinking of going first.

On Tue, Jul 12, 2016 at 9:37 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> If you're looking something similar to cassandra-unit with Apache 2
> licence, there is a module in Achilles project that provides the same
> thing:
> https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server
>
> On Tue, Jul 12, 2016 at 12:56 PM, Peddi, Praveen <pe...@amazon.com> wrote:
>
>> We do something similar by starting CassandraDaemon class directly (you
>> would need to provide a yaml file though). You can start and stop
>> CassandraDaemon class from your unit test (typically @BeforeClass).
>>
>> Praveen
>>
>> On Jul 12, 2016, at 3:30 AM, Stone Fang <cnstonef...@gmail.com> wrote:
>>
>> Hi,
>> how to start a embed cassandra instance?so we can do a unit test on
>> local,dont need to start a
>> cassandra server.
>>
>> https://github.com/jsevellec/cassandra-unit this project is good,but the
>> license is not suitable.
>> how do you achieve this?
>>
>> thanks in advance
>>
>> stone
>>
>>
>

-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
<http://www.schange.com/en-US/Company/InvestorRelations.aspx>
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
<http://www.linkedin.com/in/kenhancock>

[image: SeaChange International]
<http://www.schange.com/>
This e-mail and any attachments may contain information which is SeaChange
International confidential. The information enclosed is intended only for
the addressees herein and may not be copied or forwarded without permission
from SeaChange International.

Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-02-02 Thread Ken Hancock

So this rings odd to me.  If you can accomplish the same thing by using a
CAS operation, why not fix create table if not exist so that if your are
writing an application that creates the table on startup, that the
application is safe to run on multiple nodes and uses CAS to safeguard
multiple concurrent creations?


On Tue, Jan 26, 2016 at 12:32 PM, Eric Stevens <migh...@gmail.com> wrote:

> There's still a race condition there, because two clients could SELECT at
> the same time as each other, then both INSERT.
>
> You'd be better served with a CAS operation, and let Paxos guarantee
> at-most-once execution.
>
> On Tue, Jan 26, 2016 at 9:06 AM Francisco Reyes <li...@natserv.net> wrote:
>
>> On 01/22/2016 10:29 PM, Kevin Burton wrote:
>>
>> I sort of agree.. but we are also considering migrating to hourly
>> tables.. and what if the single script doesn't run.
>>
>> I like having N nodes make changes like this because in my experience
>> that central / single box will usually fail at the wrong time :-/
>>
>>
>>
>> On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>>> Instead of using ZK, why not solve your concurrency problem by removing
>>> it?  By that, I mean simply have 1 process that creates all your tables
>>> instead of creating a race condition intentionally?
>>>
>>> On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton <bur...@spinn3r.com> wrote:
>>>
>>>> Not sure if this is a bug or not or kind of a *fuzzy* area.
>>>>
>>>> In 2.0 this worked fine.
>>>>
>>>> We have a bunch of automated scripts that go through and create
>>>> tables... one per day.
>>>>
>>>> at midnight UTC our entire CQL went offline.. .took down our whole app.
>>>>  ;-/
>>>>
>>>> The resolution was a full CQL shut down and then a drop table to remove
>>>> the bad tables...
>>>>
>>>> pretty sure the issue was with schema disagreement.
>>>>
>>>> All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT
>>>> EXISTS only checks locally?
>>>>
>>>> My work around is going to be to use zookeeper to create a mutex lock
>>>> during this operation.
>>>>
>>>> Any other things I should avoid?
>>>>
>>>>
>>>> --
>>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>>> Engineers!
>>>>
>>>> Founder/CEO Spinn3r.com
>>>> Location: *San Francisco, CA*
>>>> blog:  <http://burtonator.wordpress.com>http://burtonator.wordpress.com
>>>> … or check out my Google+ profile
>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>
>>>>
>>
>>
>> --
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog:  <http://burtonator.wordpress.com>http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>
>> One way to accomplish both, a single process doing the work and having
>> multiple machines be able to do it, is to have a control table.
>>
>> You can have a table that lists what tables have been created and force
>> concistency all. In this table you list the names of tables created. If a
>> table name is in there, it doesn't need to be created again.
>>
>


-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
<http://www.schange.com/en-US/Company/InvestorRelations.aspx>
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
<http://www.linkedin.com/in/kenhancock>

[image: SeaChange International]
<http://www.schange.com/>
This e-mail and any attachments may contain information which is SeaChange
International confidential. The information enclosed is intended only for
the addressees herein and may not be copied or forwarded without permission
from SeaChange International.

Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-02-02 Thread Ken Hancock

Just to close the loop on this, but am I correct that the IF NOT EXITS
isn't the real problem?  Even multiple calls to CREATE TABLE cause the same
schema mismatch if done concurrently?  Normally, a CREATE TABLE call will
return an exception that the table already exists.

On Tue, Feb 2, 2016 at 11:06 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> And CASSANDRA-10699  seems to be the sub-issue of CASSANDRA-9424 to do
> that:
> https://issues.apache.org/jira/browse/CASSANDRA-10699
>
>
> -- Jack Krupansky
>
> On Tue, Feb 2, 2016 at 9:59 AM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> Hi Ken,
>>
>> Earlier in this thread I posted a link to
>> https://issues.apache.org/jira/browse/CASSANDRA-9424
>>
>> That is the fix for these schema disagreement issues and as ay commented,
>> the plan is to use CAS. Until then we have to treat schema delicately.
>>
>> all the best,
>>
>> Sebastián
>> On Feb 2, 2016 9:48 AM, "Ken Hancock" <ken.hanc...@schange.com> wrote:
>>
>>> So this rings odd to me.  If you can accomplish the same thing by using
>>> a CAS operation, why not fix create table if not exist so that if your are
>>> writing an application that creates the table on startup, that the
>>> application is safe to run on multiple nodes and uses CAS to safeguard
>>> multiple concurrent creations?
>>>
>>>
>>> On Tue, Jan 26, 2016 at 12:32 PM, Eric Stevens <migh...@gmail.com>
>>> wrote:
>>>
>>>> There's still a race condition there, because two clients could SELECT
>>>> at the same time as each other, then both INSERT.
>>>>
>>>> You'd be better served with a CAS operation, and let Paxos guarantee
>>>> at-most-once execution.
>>>>
>>>> On Tue, Jan 26, 2016 at 9:06 AM Francisco Reyes <li...@natserv.net>
>>>> wrote:
>>>>
>>>>> On 01/22/2016 10:29 PM, Kevin Burton wrote:
>>>>>
>>>>> I sort of agree.. but we are also considering migrating to hourly
>>>>> tables.. and what if the single script doesn't run.
>>>>>
>>>>> I like having N nodes make changes like this because in my experience
>>>>> that central / single box will usually fail at the wrong time :-/
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad <j...@jonhaddad.com>
>>>>> wrote:
>>>>>
>>>>>> Instead of using ZK, why not solve your concurrency problem by
>>>>>> removing it?  By that, I mean simply have 1 process that creates all your
>>>>>> tables instead of creating a race condition intentionally?
>>>>>>
>>>>>> On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton <bur...@spinn3r.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Not sure if this is a bug or not or kind of a *fuzzy* area.
>>>>>>>
>>>>>>> In 2.0 this worked fine.
>>>>>>>
>>>>>>> We have a bunch of automated scripts that go through and create
>>>>>>> tables... one per day.
>>>>>>>
>>>>>>> at midnight UTC our entire CQL went offline.. .took down our whole
>>>>>>> app.  ;-/
>>>>>>>
>>>>>>> The resolution was a full CQL shut down and then a drop table to
>>>>>>> remove the bad tables...
>>>>>>>
>>>>>>> pretty sure the issue was with schema disagreement.
>>>>>>>
>>>>>>> All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT
>>>>>>> EXISTS only checks locally?
>>>>>>>
>>>>>>> My work around is going to be to use zookeeper to create a mutex
>>>>>>> lock during this operation.
>>>>>>>
>>>>>>> Any other things I should avoid?
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> We’re hiring if you know of any awesome Java Devops or Linux
>>>>>>> Operations Engineers!
>>>>>>>
>>>>>>> Founder/CEO Spinn3r.com
>>>>>>> Location: *San Francisco, CA*
>>>>>>> blog:  <http://burtonator.wordpress.com>
>>>>>>> http://burtonator.wordpress.com
>>>&g

Cassandra 1.2 & Compressed Data

2016-01-11 Thread Ken Hancock

We were running a contrived system test last week trying to measure the
effect that compaction was having on our I/O and read performance.  As a
test, we set compaction throughput to 1MB/sec.

As expected, we fell greatly behind and the number of SSTables grew.
Unexpectedly, we went OOM.

One of my CFs had 1127 SSTables and those SSTables had a Retained Heap of
almost 1GB.  This was after stopping both compaction and all reads and
writes as well as a executing a full GC.

Here's the heap dump summary for a single CF:

Class Name| Objects |
Shallow Heap |  Retained Heap

org.apache.cassandra.io.sstable.SSTableReader |   1,127
|  117,208 | >= 985,675,936
|- org.apache.cassandra.io.sstable.SSTableMetadata|   1,127
|   63,112 |   >= 7,284,600
|- java.util.concurrent.atomic.AtomicLong |   2,254
|   54,096 |  >= 54,096
|- org.apache.cassandra.db.DecoratedKey   |   2,254
|   54,096 | >= 378,672
|- org.apache.cassandra.io.sstable.SSTableDeletingTask|   1,127
|   45,080 |  >= 45,080
|- org.apache.cassandra.io.util.CompressedPoolingSegmentedFile|   1,127
|   45,080 | >= 969,094,776
|- org.apache.cassandra.io.sstable.Descriptor |   1,127
|   45,080 | >= 483,696
|- org.apache.cassandra.io.sstable.BloomFilterTracker |   1,127
|   45,080 |  >= 99,176
|- org.apache.cassandra.io.util.MmappedSegmentedFile  |   1,127
|   45,080 | >= 360,640
|- java.util.concurrent.atomic.AtomicBoolean  |   2,254
|   36,064 |  >= 36,064
|- org.apache.cassandra.utils.Murmur3BloomFilter  |   1,127
|   27,048 |  >= 81,144
|- org.apache.cassandra.io.sstable.IndexSummary   |   1,127
|   27,048 |   >= 7,896,104
|- java.util.concurrent.CopyOnWriteArraySet   |   1,127
|   18,032 | >= 153,272
|- java.util.concurrent.atomic.AtomicInteger  |   1,127
|   18,032 |  >= 18,032
|- org.apache.cassandra.config.CFMetaData |   1
|  120 |  1,608
|- org.apache.cassandra.cache.AutoSavingCache |   1
|   40 | 56
|- java.lang.Class|   1
|   16 | 16
|- org.apache.cassandra.dht.Murmur3Partitioner|   1
|   16 | 32


The retained heap is all in the io.util.CompressedPoolingSegmentedFile.
Specifically, it is all used up by
io.compress.CompressedRandomAccessReader's.compressed ByteBuffer.

I'm not familiar with the cassandra source code, but here's how I'm reading
it.  A SSTable is segmented and a ConcurrentLinkedQueue (appears unbounded)
is created which will contain a Reader for each segment.  Since this table
is compressed, each segment has a
io.compress.CompressedRandomAccessReader.  CompressedRandomAccessReader
allocates an on-heap ByteBuffer, buffer, to receive decompressed data.

It appears, this buffer is only released when the SSTable is closed, i.e.
when it's compressed or cassandra shuts down.

In our case, we had a contrived test where compression was essentially
disabled.  However, if I have a huge table which will not get compressed
for weeks (STCS), it seems that for each segment Cassandra will allocate a
CompressedRandomAccessReader which will allocate a 65K decompression buffer
for each segment that is read and those will never get freed and are
unbounded.  My reading is the memory requirements in Cassandra 1.2.18 for
compressed data become unbounded and can consume as much heap space as
compressed data is read.

Seaching Jira, I found https://issues.apache.org/jira/browse/CASSANDRA-5661
which sounds like the fix effectively orphaned Cassandra 1.2:

"Reader pooling was introduced in CASSANDRA-4942
 but pooled
RandomAccessReaders are never cleaned up until the SSTableReader is closed.
So memory use is "the worst case simultaneous RAR we had open for this
file, forever."

We should introduce a global limit on how much memory to use for RAR, and
evict old ones."

I'm not clear how the "simultaneous" comment above applies.  If I'm reading
this correctly, STCS and compressed data is a ticking timebomb for
Cassandra 1.2.
Hopefully someone with more knowledge of the source code can let me know if
my analysis is correct.

Re: compaction_throughput_mb_per_sec

2016-01-05 Thread Ken Hancock

As to why I think it's cluster-wide, here's what the documentation says:

https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html
compaction_throughput_mb_per_sec
<https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__compaction_throughput_mb_per_sec>
(Default: 16 ) Throttles compaction to the specified total throughput
across the entire system. The faster you insert data, the faster you need
to compact in order to keep the SSTable count down. The recommended Value
is 16 to 32 times the rate of write throughput (in MBs/second). Setting the
value to 0 disables compaction throttling. Perhaps "across the entire
system" means "across all keyspaces for this Cassandra node"?

Compare the above documentation with the subsequent one which specifically
calls out "a node":

concurrent_compactors
<https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__concurrent_compactors>
(Default: 1 per CPU core**) Sets the number of concurrent compaction
processes allowed to run simultaneously on a node, not including validation
compactions for anti-entropy repair. Simultaneous compactions help preserve
read performance in a mixed read-write workload by mitigating the tendency
of small SSTables to accumulate during a single long-running compaction. If
compactions run too slowly or too fast, change
compaction_throughput_mb_per_sec
<https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__compaction_throughput_mb_per_sec>
first. I always thought it was per-node and I'm guessing this is a
documentation lack of clarity issue.

On Mon, Jan 4, 2016 at 5:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> Why do you think it’s cluster wide? That param is per-node, and you can
> change it at runtime with nodetool (or via the JMX interface using jconsole
> to ip:7199 )
>
>
>
> From: Ken Hancock
> Reply-To: "user@cassandra.apache.org"
> Date: Monday, January 4, 2016 at 12:59 PM
> To: "user@cassandra.apache.org"
> Subject: compaction_throughput_mb_per_sec
>
> I was surprised the other day to discover that this was a cluster-wide
> setting.   Why does that make sense?
>
> In a heterogeneous cassandra deployment, say I have some old servers
> running spinning disks and I'm bringing on more nodes that perhaps utilize
> SSD.  I want to have different compaction throttling  on different nodes to
> minimize read impact times.
>
> I can already balance data ownership through either token allocation or
> vnode counts.
>
> Also, as I increase my node count, I technically also have to increase my
> compaction_throughput which would require a rolling restart across the
> cluster.
>
>
>

Re: compaction_throughput_mb_per_sec

2016-01-05 Thread Ken Hancock

Will do.  I searched the doc for additional usage of the term "system"

commitlog_segment_size_in_mb refers to "every table in the system"
concurrent_writes talks about CPU cores "in your system"

That's it for "system" other than the compaction_throughput_mb_per_sec
which refers to "across the entire system".

node is the predominant term in the yaml configuration, though I can
certainly see potential confusion with vnodes.

On Tue, Jan 5, 2016 at 2:26 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Jan 5, 2016 at 6:50 AM, Ken Hancock <ken.hanc...@schange.com>
> wrote:
>
>> As to why I think it's cluster-wide, here's what the documentation says:
>>
>
> Do you see "system" used in place of "cluster" anywhere else in the docs?
>
> I think you are correct that the docs should standardize on "system"
> instead of "node", because node to me includes vnodes. "system" or "host"
> is what I think of as "the entire cassandra process".
>
> If I were you, I'd email docs AT datastaxdotcom with your feedback. :D
>
> =Rob
>
>

-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
<http://www.schange.com/en-US/Company/InvestorRelations.aspx>
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
<http://www.linkedin.com/in/kenhancock>

[image: SeaChange International]
<http://www.schange.com/>This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: memtable flush size with LCS

2015-10-29 Thread Ken Hancock

Or if you're doing a high volume of writes, then your flushed file size may
be completely determined by other CFs that have consumed the commitlog
size, forcing any memtables whose commitlog is being delete to be forced to
disk.

On Wed, Oct 28, 2015 at 2:51 PM, Jeff Jirsa 
wrote:

> It’s worth mentioning that initial flushed file size is typically
> determined by memtable_cleanup_threshold and the memtable space options
> (memtable_heap_space_in_mb, memtable_offheap_space_in_mb, depending on
> memtable_allocation_type)
>
>
>
> From: Nate McCall
> Reply-To: "user@cassandra.apache.org"
> Date: Wednesday, October 28, 2015 at 11:45 AM
> To: Cassandra Users
> Subject: Re: memtable flush size with LCS
>
>
>  do you mean that this property is ignored at memtable flush time, and so
>> memtables are already allowed to be much larger than sstable_size_in_mb?
>>
>
> Yes, 'sstable_size_in_mb' plays no part in the flush process. Flushing is
> based on solely on runtime activity and the file size is determined by
> whatever was in the memtable at that time.
>
>
>
> --
> -
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: How to remove huge files with all expired data sooner?

2015-09-28 Thread Ken Hancock

On Mon, Sep 28, 2015 at 2:59 AM, Erick Ramirez  wrote:

> have many tables like this, and I'd like to reclaim those spaces sooner.
> What would be the best way to do it? Should I run "nodetool compact" when I
> see two large files that are 2 weeks old? Is there configuration parameters
> I can tune to achieve the same effect? I looked through all the CQL
> Compaction Subproperties for STCS, but I am not sure how they can help
> here. Any suggestion is welcome.

You can use the JMX org.apache.cassandra.db:type=StorageService
forceTableCompaction to compact a single table.

Last time this came up, Robert Coli also indicated he thought nodetool
cleanup would trigger the same thing, but I never got a chance to confirm
that as I'd already done something with forceTableCompaction.  If you have
the data and try a cleanup, please report back your findings.

Re: How to run any application on Cassandra cluster in high availability mode

2015-08-18 Thread Ken Hancock

Off-topic to the Cassandra list, but corosync/pacemaker comes to mind for
automatic service switchover between nodes.

For monitoring and alerting, there's almost too many to mention...





On Tue, Aug 18, 2015 at 2:45 PM, Vikram Kone vikramk...@gmail.com wrote:

 Hi John,
 I have posted the same Q on azkaban google group but there is no response
 so far :(
 If i want to do the old school way of monitor, alert and start the process
 somewhere else..how can I do this? Are there some ready made tools to do
 this kind of general purpose monitoring and alerting for services on linux?

 On Sun, Aug 16, 2015 at 9:38 AM, Prem Yadav ipremya...@gmail.com wrote:

 The MySQL is there just to save the state of things. I suppose it very
 lightweight. Why not just install mysql on one of the nodes or a VM
 somewhere.


 On Sun, Aug 16, 2015 at 3:39 PM, John Wong gokoproj...@gmail.com wrote:

 Sorry i meant integration with Cassandra (based on the docs by default
 it suggests MySQL)


 On Sunday, August 16, 2015, John Wong gokoproj...@gmail.com wrote:

 There is no leader in cassandra. I suggest you ask Azkaban community
 about intgteation with Azkaban and Azkaban HA.

 On Sunday, August 16, 2015, Vikram Kone vikramk...@gmail.com wrote:

 Can't we use zoo keeper for leader election in Cassandra and based on
 who is leader ..run azkaban or any app instance for that matter on that
 Cassandra server. I'm thinking that I can copy the applocation folder to
 all nodes and then determine which one to run using zookeeper. Is that
 possible ?

 Sent from Outlook http://aka.ms/Ox5hz3




 On Sun, Aug 16, 2015 at 6:47 AM -0700, John Wong 
 gokoproj...@gmail.com wrote:

 Hi

 I am not familiar with Azkaban and probably a better question to the
 Azkaban community IMO. But there seems to be two modes (
 http://azkaban.github.io/azkaban/docs/2.5/) one is solo and one is
 two-server mode, but either way I think still SPOF? If there is no
 election, just based on process, my 2 cents would be monitor, alert, and
 start the process somewhere else. Better yet, don't install the process 
 on
 Cassandra node. Keep your instance for one purpose only. If you run cloud
 like AWS you will be able to autoscale min1 max1 easily.


 Note: In peer-to-peer architecture, there is simply no concept of
 master. You can start with some seed nodes for discovery. It depends how
 you design discovery.

 On Sat, Aug 15, 2015 at 11:49 AM, Vikram Kone vikramk...@gmail.com
 wrote:

 Hi,
 We are planning to install Azkaban in solo server mode on a 24
 node cassandra cluster to be able to schedule spark jobs with intricate
 dependency chain. The problem, is since Cassandra has a no-SPOF
 architecture ie any node can become the master for the cluster, it 
 creates
 the problem for Azkaban master since it's not a peer-peer architecture
 where any node can become the master. Only a single mode has to be 
 master
 at any given time.

 What are our options here? Are there any framworks or tools out
 there that would allow any application to run on a cluster of machines 
 with
 high availablity?
 Should I be looking at something like zookeeper for this ? Or Mesos
 may be?




 --
 Sent from Jeff Dean's printf() mobile console



 --
 Sent from Jeff Dean's printf() mobile console

Re: nodetool getendpoints options

2015-07-20 Thread Ken Hancock

There is no difference.

In #2, I'm guessing you're confusing using some of the column names as
keys.  You could also do getendpoints Mykeyspace Mytable 'foo' and
Mykeyspace Mytable 'bar'

getendpoints does not require any data in your column family to function;
it only requires a schema for the column family so it can convert the 3rd
argument into the native representation in order to hash it.  That hash and
the token/vnode ownership range for each node in the cluster determines
which node owns that specific key.  The replication factor (from the
keyspace argument) along with the topology determines which additional
nodes own replicas of that key.

Since each getendpoints query returned two nodes, that indicates your
keyspaces has RF=2 (also the fact that the sum of your ring has 200%
ownership indicates RF=2).



On Mon, Jul 20, 2015 at 10:38 AM, Thouraya TH thouray...@gmail.com wrote:

 Please, what is the difference between 1) and 2)
 http://pastie.org/private/ydziir2376ru58ywrjq


 Bests.

 2015-07-20 12:55 GMT+01:00 Thouraya TH thouray...@gmail.com:

 Hi all,

 Please, is that possible to change this command:

 nodetool getendpoints MyKeyspace MyTable text
 10.147.243.4
 10.147.243.5

 to

 nodetool getendpoints MyKeyspace MyTable text=Hello  ?

 is there a solution to get more details about mes rows ?

 Thanks a lot.
 Best Regards.

Re: Hundreds of sstables after every Repair

2015-06-10 Thread Ken Hancock

Perhaps doing a sstable2json on some of the small tables would shed some
illumination.  I was going to suggest the anticompaction feature of C*2.1
(which I'm not familiar with), but you're on 2.0.

On Tue, Jun 9, 2015 at 11:11 AM, Anuj Wadehra anujw_2...@yahoo.co.in
wrote:

 We were facing dropped mutations earlier and we increased flush writers.
 Now there are no dropped mutations in tpstats. To repair the damaged vnodes
 / inconsistent data we executed repair -pr on all nodes. Still, we see the
 same problem.

 When we analyze repair logs we see 2 strange things:

 1. Out of sync ranges for cf which are not being actively being
 written/updated while the repair is going on. When we repaired all data by
 repair -pr on all nodes, why out of sync data?

 2. For some cf , repair logs shows that all ranges are consistent. Still
 we get so many sstables created during repair. When everything is in sync ,
 why repair creates tiny sstables to repair data?

 Thanks
 Anuj Wadehra

 Sent from Yahoo Mail on Android
 https://overview.mail.yahoo.com/mobile/?.src=Android
 --
   *From*:Ken Hancock ken.hanc...@schange.com
 *Date*:Tue, 9 Jun, 2015 at 8:24 pm
 *Subject*:Re: Hundreds of sstables after every Repair

 I think this came up recently in another thread.  If you're getting large
 numbers of SSTables after repairs, that means that your nodes are diverging
 from the keys that they're supposed to be having.  Likely you're dropping
 mutations.  Do a nodetool tpstats on each of your nodes and look at the
 mutation droppped counters.  If you're seeing dropped message, my money you
 have a non-zero FlushWriter All time blocked stat which is causing
 mutations to be dropped.



 On Tue, Jun 9, 2015 at 10:35 AM, Anuj Wadehra anujw_2...@yahoo.co.in
 wrote:

 Any suggestions or comments on this one?

 Thanks
 Anuj Wadehra

 Sent from Yahoo Mail on Android
 https://overview.mail.yahoo.com/mobile/?.src=Android
 --
   *From*:Anuj Wadehra anujw_2...@yahoo.co.in
 *Date*:Sun, 7 Jun, 2015 at 1:54 am
 *Subject*:Hundreds of sstables after every Repair

 Hi,

 We are using 2.0.3 and vnodes. After every repair -pr operation  50+ tiny
 sstables( 10K) get created. And these sstables never get compacted due to
 coldness issue. I have raised
 https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but
 I have been told to upgrade. Till we upgrade to latest 2.0.x , we are
 stuck. Upgrade takes time, testing and planning in Production systems :(

 I have observed that even if vnodes are NOT damaged, hundreds of tiny
 sstables are created during repair for a wide row CF. This is beyond my
 understanding. If everything is consistent, and for the entire repair
 process Cassandra is saying Endpoints /x.x.x.x and /x.x.x.y are consistent
 for CF. Whats the need of creating sstables?

 Is there any alternative to regular major compaction to deal with
 situation?


 Thanks
 Anuj Wadehra

Re: Hundreds of sstables after every Repair

2015-06-09 Thread Ken Hancock

I think this came up recently in another thread.  If you're getting large
numbers of SSTables after repairs, that means that your nodes are diverging
from the keys that they're supposed to be having.  Likely you're dropping
mutations.  Do a nodetool tpstats on each of your nodes and look at the
mutation droppped counters.  If you're seeing dropped message, my money you
have a non-zero FlushWriter All time blocked stat which is causing
mutations to be dropped.



On Tue, Jun 9, 2015 at 10:35 AM, Anuj Wadehra anujw_2...@yahoo.co.in
wrote:

 Any suggestions or comments on this one?

 Thanks
 Anuj Wadehra

 Sent from Yahoo Mail on Android
 https://overview.mail.yahoo.com/mobile/?.src=Android
 --
   *From*:Anuj Wadehra anujw_2...@yahoo.co.in
 *Date*:Sun, 7 Jun, 2015 at 1:54 am
 *Subject*:Hundreds of sstables after every Repair

 Hi,

 We are using 2.0.3 and vnodes. After every repair -pr operation  50+ tiny
 sstables( 10K) get created. And these sstables never get compacted due to
 coldness issue. I have raised
 https://issues.apache.org/jira/browse/CASSANDRA-9146 for this issue but I
 have been told to upgrade. Till we upgrade to latest 2.0.x , we are stuck.
 Upgrade takes time, testing and planning in Production systems :(

 I have observed that even if vnodes are NOT damaged, hundreds of tiny
 sstables are created during repair for a wide row CF. This is beyond my
 understanding. If everything is consistent, and for the entire repair
 process Cassandra is saying Endpoints /x.x.x.x and /x.x.x.y are consistent
 for CF. Whats the need of creating sstables?

 Is there any alternative to regular major compaction to deal with
 situation?


 Thanks
 Anuj Wadehra

Re: Multiple cassandra instances per physical node

2015-05-26 Thread Ken Hancock

I had the exact same question, but I think this is what Nate was thinking:

If you're running multiple nodes on a single server, vnodes give you no
control over which instance has which key (whereas you can assign initial
tokens).  Therefore you could have two of your three replicas on the same
physical server which, if it goes down, you can't read or write at quorum.

However, can't you use the topology snitch to put both nodes in the same
rack?  Won't that prevent the issue and still allow you to maintain quorum
if a single server goes down?  If I have a 20-node cluster with 2 nodes on
each physical server, can I use 10 racks to properly segment my partitions?



On Sun, May 24, 2015 at 5:38 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 What impact would vnodes have on strong consistency?  I think the problem
 you're describing exists with or without them.

 On Sat, May 23, 2015 at 2:30 PM Nate McCall n...@thelastpickle.com
 wrote:


 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra
 nodes (each with 5 data disks, 1 commit log disk) and either give each its
 own container  IP or change the listen ports. Will this work? What are the
 risks? Will/should Cassandra support this better in the future?


 Don't use vnodes if any operations need strong consistency (reading or
 writing at quorum). Otherwise, at RF=3, if you loose a single node you will
 only have one 1 replica left for some portion of the ring.



 --
 -
 Nate McCall
 Austin, TX
 @zznate

 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Drop/Create table with same CF Name

2015-05-26 Thread Ken Hancock

Nate, how does this get around the issue?  I'm guessing that just extends
the timeout, but if I had a server failure such that the server was down
for a couple hours, truncate would still have issues?


On Sat, May 23, 2015 at 5:46 PM, Nate McCall n...@thelastpickle.com wrote:

 
  Truncate would have been the tool of choice, however my understanding is
 truncate fails unless all nodes are up and running which makes it a
 non-workable choice since we can't determine when failures will occur.
 

 You can get around this via:
 - in cassandra.yaml, turning up truncate_request_timeout_in_ms to 10
 minutes
 - stopping all compactions: nodetool stop
 [compaction|validation|cleanup|scrub|index_build] (that's 5 commands total)



 --
 -
 Nate McCall
 Austin, TX
 @zznate

 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

Re: Drop/Create table with same CF Name

2015-05-22 Thread Ken Hancock

I hadn't run into that one, but I did run into the old (deleted) SSTables
somehow getting injected back into the dropped column family, resulting in
a exception and the Thrift connection closing down.

On Fri, May 22, 2015 at 7:53 AM, Walsh, Stephen stephen.wa...@aspect.com
wrote:

  Can someone share the content on this link please, I’m aware of issues
 where recreating key spaces can cause inconsistency in 2.0.13 if memTables
 are not flushed beforehand , is this the issues that is resolved?





 *From:* Ken Hancock [mailto:ken.hanc...@schange.com]
 *Sent:* 21 May 2015 17:13
 *To:* user@cassandra.apache.org
 *Subject:* Re: Drop/Create table with same CF Name



 Thanks Mark (though that article doesn't appear publicly accessible for
 others).

 Truncate would have been the tool of choice, however my understanding is
 truncate fails unless all nodes are up and running which makes it a
 non-workable choice since we can't determine when failures will occur.

 Ken



 On Thu, May 21, 2015 at 11:00 AM, Mark Reddy mark.l.re...@gmail.com
 wrote:

  Yes, it's a known issue. For more information on the topic see this
 support post from DataStax:




 https://support.datastax.com/hc/en-us/articles/204226339-How-to-drop-and-recreate-a-table-in-Cassandra-versions-older-than-2-1


   Mark



 On 21 May 2015 at 15:31, Ken Hancock ken.hanc...@schange.com wrote:



 We've been running into the reused key cache issue (CASSANDRA-5202) with
 dropping and recreating the same table in Cassandra 1.2.18 so we've been
 testing with key caches disabled which does not seem to solve the issue.
 In the latest logs it seems that old SSTables metadata gets read after the
 tables have been deleted by the previous drop, eventually causing an
 exception and the Thrift interface shut down.

 At this point is it a known issue that one CANNOT reuse a table name prior
 to Cassandra 2.1 ?






   This email (including any attachments) is proprietary to Aspect
 Software, Inc. and may contain information that is confidential. If you
 have received this message in error, please do not read, copy or forward
 this message. Please notify the sender immediately, delete it from your
 system and destroy any copies. You may not further disclose or distribute
 this email or its attachments.

Re: Drop/Create table with same CF Name

2015-05-22 Thread Ken Hancock

This issue really needs to be strongly highlighted in the documentation.
Imagine someone noticing similarities between SQL and CQL and assuming that
one could actually drop a table and recreate the table as a method of
deleting all the data...totally crazy, I know...

On Fri, May 22, 2015 at 11:06 AM, Walsh, Stephen stephen.wa...@aspect.com
wrote:

  Thanks for the link,



 I don’t think your link is what I hand mind – considering it mentioned to
 be fixed in 2.0.13



 I was referring to this “won’t fix” issue

 https://issues.apache.org/jira/browse/CASSANDRA-4857



 We’ve seen this a few times, we’re we drop a key space and re-create it
 and get inconstancy issues.

 It even happened to me mid Message thread on these boards.



 http://www.mail-archive.com/user%40cassandra.apache.org/msg42139.html







 *From:* Sebastian Estevez [mailto:sebastian.este...@datastax.com]
 *Sent:* 22 May 2015 14:46

 *To:* user@cassandra.apache.org
 *Subject:* Re: Drop/Create table with same CF Name



  I’m aware of issues where recreating key spaces can cause inconsistency
 in 2.0.13 if memTables are not flushed beforehand , is this the issues that
 is resolved?



 Yep, that's https://issues.apache.org/jira/browse/CASSANDRA-7511


All the best,



 *[image: datastax_logo.png] http://www.datastax.com/*

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax[image:
 facebook.png] https://www.facebook.com/datastax[image: twitter.png]
 https://twitter.com/datastax[image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax


  http://cassandrasummit-datastax.com/



 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.



 On Fri, May 22, 2015 at 7:53 AM, Walsh, Stephen stephen.wa...@aspect.com
 wrote:

  Can someone share the content on this link please, I’m aware of issues
 where recreating key spaces can cause inconsistency in 2.0.13 if memTables
 are not flushed beforehand , is this the issues that is resolved?





 *From:* Ken Hancock [mailto:ken.hanc...@schange.com]
 *Sent:* 21 May 2015 17:13
 *To:* user@cassandra.apache.org
 *Subject:* Re: Drop/Create table with same CF Name



 Thanks Mark (though that article doesn't appear publicly accessible for
 others).

 Truncate would have been the tool of choice, however my understanding is
 truncate fails unless all nodes are up and running which makes it a
 non-workable choice since we can't determine when failures will occur.

 Ken



 On Thu, May 21, 2015 at 11:00 AM, Mark Reddy mark.l.re...@gmail.com
 wrote:

  Yes, it's a known issue. For more information on the topic see this
 support post from DataStax:




 https://support.datastax.com/hc/en-us/articles/204226339-How-to-drop-and-recreate-a-table-in-Cassandra-versions-older-than-2-1


   Mark



 On 21 May 2015 at 15:31, Ken Hancock ken.hanc...@schange.com wrote:



 We've been running into the reused key cache issue (CASSANDRA-5202) with
 dropping and recreating the same table in Cassandra 1.2.18 so we've been
 testing with key caches disabled which does not seem to solve the issue.
 In the latest logs it seems that old SSTables metadata gets read after the
 tables have been deleted by the previous drop, eventually causing an
 exception and the Thrift interface shut down.

 At this point is it a known issue that one CANNOT reuse a table name prior
 to Cassandra 2.1 ?







 This email (including any attachments) is proprietary to Aspect Software,
 Inc. and may contain information that is confidential. If you have received
 this message in error, please do not read, copy or forward this message.
 Please notify the sender immediately, delete it from your system and
 destroy any copies. You may not further disclose or distribute this email
 or its attachments.


  This email (including any attachments) is proprietary to Aspect
 Software, Inc. and may contain information that is confidential. If you
 have received this message in error, please do not read, copy or forward
 this message. Please notify the sender immediately, delete it from your
 system and destroy any copies. You may not further disclose or distribute
 this email or its attachments.

Re: Disabling auto snapshots

2015-05-21 Thread Ken Hancock

Is there any method to disable this programmatically on a table-by-table
basis.

I'm running into an issue regarding drop table which I'll post in a
separate thread.

On Thu, May 21, 2015 at 3:34 AM, Mark Reddy mark.l.re...@gmail.com wrote:

 To disable auto snapshots, set the property auto_snapshot: false in your
 cassandra.yaml file.

 Mark

 On 21 May 2015 at 08:30, Ali Akhtar ali.rac...@gmail.com wrote:

 Is there a config setting where automatic snapshots can be disabled? I
 have a use case where a table is truncated quite often, and would like to
 not have snapshots. I can't find anything on google.

 Thanks.

Drop/Create table with same CF Name

2015-05-21 Thread Ken Hancock

We've been running into the reused key cache issue (CASSANDRA-5202) with
dropping and recreating the same table in Cassandra 1.2.18 so we've been
testing with key caches disabled which does not seem to solve the issue.
In the latest logs it seems that old SSTables metadata gets read after the
tables have been deleted by the previous drop, eventually causing an
exception and the Thrift interface shut down.

At this point is it a known issue that one CANNOT reuse a table name prior
to Cassandra 2.1 ?

Re: Drop/Create table with same CF Name

2015-05-21 Thread Ken Hancock

Thanks Mark (though that article doesn't appear publicly accessible for
others).

Truncate would have been the tool of choice, however my understanding is
truncate fails unless all nodes are up and running which makes it a
non-workable choice since we can't determine when failures will occur.

Ken


On Thu, May 21, 2015 at 11:00 AM, Mark Reddy mark.l.re...@gmail.com wrote:

 Yes, it's a known issue. For more information on the topic see this
 support post from DataStax:


 https://support.datastax.com/hc/en-us/articles/204226339-How-to-drop-and-recreate-a-table-in-Cassandra-versions-older-than-2-1

 Mark

 On 21 May 2015 at 15:31, Ken Hancock ken.hanc...@schange.com wrote:


 We've been running into the reused key cache issue (CASSANDRA-5202) with
 dropping and recreating the same table in Cassandra 1.2.18 so we've been
 testing with key caches disabled which does not seem to solve the issue.
 In the latest logs it seems that old SSTables metadata gets read after the
 tables have been deleted by the previous drop, eventually causing an
 exception and the Thrift interface shut down.

 At this point is it a known issue that one CANNOT reuse a table name
 prior to Cassandra 2.1 ?

Re: Updating only modified records (where lastModified current date)

2015-05-13 Thread Ken Hancock

While updates don't create tombstones, overwrites create a similar
performance penalty at the read phase.  That key will need to be fetched
from every SSTable where it resides so the most recent column can be
returned.



On Wed, May 13, 2015 at 6:38 AM, Peer, Oded oded.p...@rsa.com wrote:

  You can use the “last modified” value as the TIMESTAMP for your UPDATE
 operation.

 This way the values will only be updated if lastModified date  the
 lastModified you have in the DB.



 Updates to values don’t create tombstones. Only deletes (either by
 executing delete, inserting a null value or by setting a TTL) create
 tombstones.





 *From:* Ali Akhtar [mailto:ali.rac...@gmail.com]
 *Sent:* Wednesday, May 13, 2015 1:27 PM
 *To:* user@cassandra.apache.org
 *Subject:* Updating only modified records (where lastModified  current
 date)



 I'm running some ETL jobs, where the pattern is the following:



 1- Get some records from an external API,



 2- For each record, see if its lastModified date  the lastModified i have
 in db (or if I don't have that record in db)



 3- If lastModified  dbLastModified, the item wasn't changed, ignore it.
 Otherwise, run an update query and update that record.



 (It is rare for existing records to get updated, so I'm not that concerned
 about tombstones).



 The problem however is, since I have to query each record's lastModified,
 one at a time, that's adding a major bottleneck to my job.



 E.g if I have 6k records, I have to run a total of 6k 'select lastModified
 from myTable where id = ?' queries.



 Is there a better way, am I doing anything wrong, etc? Any suggestions
 would be appreciated.



 Thanks.

Re: Why select returns tombstoned results?

2015-03-31 Thread Ken Hancock

Have you checked time sync across all servers?  The fact that you've
changed consistency levels and you're getting different results may
indicate something inherently wrong with the cluster such as writes being
dropped or time differences between the nodes.

A brute-force approach to better understand what's going on (especially if
you have an example of the wrong data being returned) is to do a
sstable2json on all your tables and simply grep for an example key.

On Mon, Mar 30, 2015 at 4:39 PM, Benyi Wang bewang.t...@gmail.com wrote:

 Thanks for replying.

 In cqlsh, if I change to Quorum (Consistency quorum), sometime the select
 return the deleted row, sometime not.

 I have two virtual data centers: service (3 nodes) and analytics(4 nodes
 collocate with Hadoop data nodes).The table has 3 replicas in service and 2
 in analytics. When I wrote, I wrote into analytics using local_one. So I
 guest the data may not replicated to all nodes yet.

 I will try to use strong consistency for write.



 On Mon, Mar 30, 2015 at 11:59 AM, Prem Yadav ipremya...@gmail.com wrote:

 Increase the read CL to quorum and you should get correct results.
 How many nodes do you have in the cluster and what is the replication
 factor for the keyspace?

 On Mon, Mar 30, 2015 at 7:41 PM, Benyi Wang bewang.t...@gmail.com
 wrote:

 Create table tomb_test (
guid text,
content text,
range text,
rank int,
id text,
cnt int
primary key (guid, content, range, rank)
 )

 Sometime I delete the rows using cassandra java driver using this query

 DELETE FROM tomb_test WHERE guid=? and content=? and range=?

 in Batch statement with UNLOGGED. CONSISTENCE_LEVEL is local_one.

 But if I run

 SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1' and
 range='week'
 or
 SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1' and
 range='week' and rank = 1

 The result shows the deleted rows.

 If I run this select, the deleted rows are not shown

 SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1'

 If I run delete statement in cqlsh, the deleted rows won't show up.

 How can I fix this?






-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Ken Hancock

When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
same problem that you highlight, no different than your good idea of
asynchronously pushing to ES.

Each Cassandra write was indexed independently by each server in the
replication group.  If a node timed out or a mutation was dropped, that
Solr node would have an out-of-sync index.  Doing a solr query such as
count(*) users could return inconsistent results depending on which node
you hit since solr didn't support Cassandra consistency levels.

I haven't seen any blog posts or docs as to whether this intrinsic mismatch
between how Cassandra handles eventual consistency and Solr has ever been
resolved.

Ken


On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Be very very careful not to perform blocking calls to ElasticSearch in
 your trigger otherwise you will kill C* performance. The biggest danger of
 the triggers in their current state is that they are on the write path.

 In your trigger, you can try to push the mutation asynchronously to ES but
 in this case it will mean managing a thread pool and all related issues.

 Not even mentioning atomicity issues like: what happen if the update to ES
 fails  or the connection times out ? etc ...

 As an alternative, instead of implementing yourself the integration with
 ES, you can have a look at Datastax Enterprise integration of Cassandra
 with Apache Solr (not free) or some open-source alternatives like Stratio
 or TupleJump fork of Cassandra with Lucene integration.

 On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK asitkaushikno...@gmail.com
 wrote:

 HI All,

 We are trying to integrate elasticsearch with Cassandra and as the river
 plugin uses select * from any table it seems to be bad performance choice.
 So i was thinking of inserting into elasticsearch using Cassandra trigger.
 So i wanted your view does a Cassandra Trigger impacts the performance of
 read/Write of Cassandra.

 Also any other way you guys achieve this please guide me. I am struck on
 this .

 Regards
 Asit

Re: 答复:

2015-01-05 Thread Ken Hancock

Better yet, if you're using a client where you can pass the time in, you
can validate it is indeed clock skew.  Do all your writes with timestamp =
0, all your deletes with timestamp = 1.

On Wed, Dec 24, 2014 at 7:47 AM, Ryan Svihla rsvi...@datastax.com wrote:

 Every time I've heard this but one this has been clock skew  (and that was
 swallowed exceptions), however it can just be you have a test that is prone
 to race conditions (delete followed by an immediate select with a low
 consistency level), without more detail it's hard to say.

 I'd check the nodes for time skew by running ntpdate on each node, and
 make sure ntpd is pointing to the same servers.

 On Wed, Dec 24, 2014 at 2:53 AM, 鄢来琼 laiqiong@gtafe.com wrote:

  Yeah, I also have the question.

 My solution is not delete the row, but insert the right row to a new
 table.



 Thanks  Regards,

 *Peter YAN*



 *发件人:* Sávio S. Teles de Oliveira [mailto:savio.te...@cuia.com.br]
 *发送时间:* 2014年8月26日 4:25
 *收件人:* user@cassandra.apache.org
 *主题:*



 We're using cassandra 2.0.9 with datastax java cassandra driver 2.0.0 in
 a cluster of eight nodes.

 We're doing an insert and after a delete like:

 delete from *column_family_name* where *id* = value

 Immediatly select to check whether the DELETE was successful. Sometimes
 the value still there!!



 Any suggestions?

 --

 Atenciosamente,
 Sávio S. Teles de Oliveira

 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles

 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software

 CUIA Internet Brasil




 --

 [image: datastax_logo.png] http://www.datastax.com/

 Ryan Svihla

 Solution Architect

 [image: twitter.png] https://twitter.com/foundev [image: linkedin.png]
 http://www.linkedin.com/pub/ryan-svihla/12/621/727/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Practical use of counters in the industry

2014-12-18 Thread Ken Hancock

Here's one from Twitter...

http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011


On Thu, Dec 18, 2014 at 6:08 PM, Rajath Subramanyam rajat...@gmail.com
wrote:

 Hi Folks,

 Have any of you come across blogs that describe how companies in the
 industry are using Cassandra counters practically.

 Thanks in advance.

 Regards,
 Rajath
 
 Rajath Subramanyam



-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: opscenter with community cassandra

2014-10-28 Thread Ken Hancock

Your criteria for what is appropriate for production may differ from
others, but it's equally incorrect of you to make a blanket statement that
OpsCenter isn't suitable for production.  A number of people use it in
production.



On Tue, Oct 28, 2014 at 11:48 AM, Colin co...@clark.ws wrote:

 No, actually, you cant Tyler.

 If you mean the useless information it provides outside of licence, fine,
  if you mean the components outside, then same argument.

 Last time i checked, this forumn was about apache and not about datastax.
 Maybe a separate group should be deducated to provider specific offerings.

 --
 *Colin Clark*
 +1-320-221-9531


 On Oct 28, 2014, at 10:41 AM, Tyler Hobbs ty...@datastax.com wrote:


 On Tue, Oct 28, 2014 at 10:08 AM, Colin colpcl...@gmail.com wrote:

 It is a mistake to call a proprietary piece of software community when
 you cant use it in production.


 You can use OpsCenter community in production (however you'd like).


 --
 Tyler Hobbs
 DataStax http://datastax.com/




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: disk space issue

2014-10-01 Thread Ken Hancock

Major compaction is bad if you're using size-tiered, especially if you're
already having capacity issues.  Once you have one huge table, with default
settings, you'll need 4x that huge table worth of storage in order for it
to compact again to ever reclaim your TTL'd data.

If you're running into space issues that are ultimately going to get your
system wedged and you're using columns with TTL, I'd recommend using the
jmx operation to compact individual tables.  This will free the TTL'd data
assuming that you've exceeded your gc_grace_seconds.  This can probably be
scripted up in a relatively easy manner with a nice,
shellshocked-vulnerable bash script and jmxterm.


On Wed, Oct 1, 2014 at 2:43 AM, Nikolay Mihaylov n...@nmmm.nu wrote:

 my 2 cents:

 try major compaction on the column family with TTL's - for sure will be
 faster than full rebuild.

 also try not cassandra related things, such check and remove old log
 files, backups etc.

 On Wed, Oct 1, 2014 at 9:34 AM, Sumod Pawgi spa...@gmail.com wrote:

 In the past in such scenarios it has helped us to check the partition
 where cassandra is installed and allocate more space for the partition.
 Maybe it is a disk space issue but it is good to check if it is related to
 the space allocation for the partition issue. My 2 cents.

 Sent from my iPhone

 On 01-Oct-2014, at 11:53 am, Dominic Letz dominicl...@exosite.com
 wrote:

 This is a shot into the dark but you could check whether you have too
 many snapshots laying around that you actually don't need. You can get rid
 of those with a quick nodetool clearsnapshot.

 On Wed, Oct 1, 2014 at 5:49 AM, cem cayiro...@gmail.com wrote:

 Hi All,

 I have a 7 node cluster. One node ran out of disk space and others are
 around 80% disk utilization.
 The data has 10 days TTL but I think compaction wasn't fast enough to
 clean up the expired data.  gc_grace value is set default. I have a
 replication factor of 3. Do you think that it may help if I delete all data
 for that node and run repair. Does node repair check the ttl value before
 retrieving data from other nodes? Do you have any other suggestions?

 Best Regards,
 Cem.




 --
 Dominic Letz
 Director of RD
 Exosite http://exosite.com





-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Repair taking long time

2014-09-29 Thread Ken Hancock

On Mon, Sep 29, 2014 at 2:29 PM, Robert Coli rc...@eventbrite.com wrote:


 As an aside, you just lose with vnodes and clusters of the size. I
 presume you plan to grow over appx 9 nodes per DC, in which case you
 probably do want vnodes enabled.


I typically only see discussion on vnodes vs. non-vnodes, but it seems to
me that might be more important to discuss the number of vnodes per node.
A small cluster having 256 vnodes/node is unwise given some of the
sequential operations that are still done.  Even if operations were done in
parallel, having a 256x increase in parallelization seems an equally bad
choice.

I've never seen any discussion on how many vnodes per node might be an
appropriate answer based a planned cluster size -- does such a thing exist?

Ken

Re: Quickly loading C* dataset into memory (row cache)

2014-09-12 Thread Ken Hancock

+1 for Redis.

It's really nice, good primitives, and then you can do some really cool
stuff chaining multiple atomic operations to create larger atomics through
the lua scripting.

On Thu, Sep 11, 2014 at 12:26 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 11, 2014 at 8:30 AM, Danny Chan tofuda...@gmail.com wrote:

 What are you referring to when you say memory store?

 RAM disk? memcached?


 In 2014, probably Redis?

 =Rob





-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Counters: consistency, atomic batch

2014-09-04 Thread Ken Hancock

Counters are way more complicated than what you're illustrating. Datastax
did a good blog post on this:

http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

On Thu, Sep 4, 2014 at 6:34 AM, Eugene Voytitsky viy@gmail.com wrote:

Hi all,

I am using Cassandra 2.0.x. and Astyanax 1.56.x (2.0.1 shows the same
results) driver via Thrift protocol.

Questions about counters:

1. Consistency.
Consider simplest case when we update value of single counter.

1.1. Is there any difference between updating counter with ONE or QUORUM
level? Yes, I understand that ONE may affect reading - readers may see old
value. It's ok, eventual consistency for the reader is ok.

I am asking, whether writing counter with ONE may lead to totally broken
data? I will explain.

* Host A stores most recent value 100, host B stores old value 99 (isn't
replicated yet).
* I increment counter with ONE. Request is sent to host B.
* B sees 99. Adds 1. Saves 100, and this 100 bacame more new than old 100
stored on host A. Later it will be replicated to A.
* Result: we lost 1 increment, cause actually value should be 101, not 100.

As I understand this scenario isn't possible with QUORUM nor ONE.
Because actually Cassandra stores counter value in shard structure.
So I can safely update counter value with ONE.
Am I right?

1.2. If I update counter with QUORUM level whether Cassandra read the old
value also with QUORUM level? Or the same point with local shard makes
possible to read only value stored on the host which doing writing?

1.3. How 1.1 and 1.2 behavior will change in Cassandra 2.1 and 3.0?
I read that in Cassandra 2.1 counters are totally reimplemented.
And in 3.0 will be too again.

2. Atomicity.
I need to log 1 event as increments to several tables (yes, we use data
duplication for different select queries)
I use single batch mutation for all increments.

Can Cassandra execute batch of counters increments in atomic manner?
Here: http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2
I see the following:
1.2 also introduces a separate BEGIN COUNTER BATCH for batched counter
updates. Unlike other writes, counter updates are not idempotent, so
replaying them automatically from the batchlog is not safe. Counter batches
are thus strictly for improved performance when updating multiple counters
in the same partition.

The text isn't 100% clear.
Does it mean that Cassandra can't guarantee atomic batch for counters even
with BEGIN COUNTER BATCH?

If it can't, in which Cassandra version atomic batch for counters will
work? And what is the difference between 'BEGIN COUNTER BATCH' and 'BEGIN
BATCH'?

If it can, do you know which driver supports BEGIN COUNTER BATCH? I
searched the whole source of Astyanax 2.0.1 and it seems that it doesn't
support it currently.

Thanks in advance!

PS. Do you know how to communicate with Astyanax team?
I wrote several questions to google groups email
astyanax-cassandra-cli...@googlegroups.com but didn't receive any answers.

--
Best regards,
Eugene Voytitsky

--
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Migration from Cassandra 1.2.5 to Cassandra 2.0.8 with changed partitioner settings

2014-08-07 Thread Ken Hancock

Did the jmx path work?


On Thu, Jul 31, 2014 at 11:16 AM, thorsten.s...@t-systems.com wrote:

 Well, we ran StorageService.bulkLoad via JMX.
 According to http://www.datastax.com/dev/blog/bulk-loading  this should
 have the same effect and can be done on the same machine:

 Because the sstableloader uses gossip to communicate with other nodes, if
 launched on the same machine that a given Cassandra node, it will need to
 use a different network interface than the Cassandra node. But if you want
 to load data from a Cassandra node, there is a simpler solution: you can
 use the JMX-StorageService-bulkload() call from said node.


 -Ursprüngliche Nachricht-
 Von: Rahul Neelakantan [mailto:ra...@rahul.be]
 Gesendet: Donnerstag, 31. Juli 2014 11:40
 An: user@cassandra.apache.org
 Cc: cassandra-u...@incubator.apache.org
 Betreff: Re: Migration from Cassandra 1.2.5 to Cassandra 2.0.8 with
 changed partitioner settings

 You said you tried restoring a snapshot via bulk loader, did you actually
 run sstableloader?

 Rahul Neelakantan

  On Jul 31, 2014, at 2:54 AM, tsi thorsten.s...@t-systems.com wrote:
 
  Well, the new Cassandra cluster is already setup with the different
  partitioner settings and there are already other applications running on
 it.
  So the task is to migrate our application data to this new cluster to
  avoid setting up a dedicated Cassandra cluster just for our application.
 
 
 
  --
  View this message in context:
  http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Migra
  tion-from-Cassandra-1-2-5-to-Cassandra-2-0-8-with-changed-partitioner-
  settings-tp7596019p7596062.html Sent from the
  cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Decommissioning a datacenter deletes the data (on decommissioned datacenter)

2014-08-07 Thread Ken Hancock

My reading is it didn't forget the schema.  It lost the data.

My reading is decomissioning worked fine.  Possibly when you changed the
replication on a keyspace to include a second data center, the data didn't
get replicated.

When you ADD a datacenter, you need to do a nodetool rebuild to get the
data streamed to the new data center.  When you alter a keyspace to include
another datacenter in its replication schema, a nodetool repair is required
-- was this done?
http://www.datastax.com/documentation/cql/3.0/cql/cql_using/update_ks_rf_t.html

When you use nodetool decomission, you're effectively deleting the
parititioning token from the cluster.  The node being decommissioned will
stream its data to the new owners of its original token range.  This
streaming in no way should affect any other datacenter because you have not
changed the tokens or data ownership for any datacenter but the one in
which you are decomissioning a node.

When you eventually decomission the last node in the datacenter, all data
is gone as there are no tokens in that datacenter to own any data.

If you had a keyspace that was only replicated within that datacenter, that
data is gone (though you could probably add nodes back in and ressurect it).

If you had a keyspace where you changed the replication to include another
datacenter, if that datacenter had never received the data, then it may
have the schema but would have none of the data (other than new data that
was written AFTER you change the replication).




On Thu, Aug 7, 2014 at 2:11 PM, srmore comom...@gmail.com wrote:




 On Thu, Aug 7, 2014 at 12:27 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Aug 7, 2014 at 10:04 AM, srmore comom...@gmail.com wrote:

 Sorry for being ambiguous.  By deletes I mean that running
 decommission I can no longer see any keyspaces owned by this node or
 replicated by other nodes using the cfstats command. I am also seeing the
 same behavior when I remove a single node from a cluster (without
 datacenters).


 I'm still not fully parsing you, but clusters should never forget
 schema as a result of decommission.

 Is that what you are saying is happening?


 Yes, this is what is happening.



 (In fact, even the decommissioned node itself does not forget its schema,
 which I personally consider a bug.)



 Ok, so I am assuming this is not a normal behavior and possibly a bug  -
 is this correct ?



 =Rob





-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: data type is object when metric instrument using Gauge?

2014-08-05 Thread Ken Hancock

If you look at VisualVM metadata, it'll show that what's return is
java.lang.Object which is different than Meters or Counters.

Looking at the source for metrics-core, it seems that this is a feature
of Gauges because unlike Meters or Counters, Gauges can be of various types
-- long, double, etc.  Cassandra source sets them up as longs, however the
JMXReporter class in metrics-core always exposes them as Objects.




On Mon, Aug 4, 2014 at 7:32 PM, Patricia Gorla patri...@thelastpickle.com
wrote:

 Mike,

 What metrics reporter are you using? How are you attempting to access the
 metric?



 On Sat, Aug 2, 2014 at 7:30 AM, mike maomao...@gmail.com wrote:

 Dear All

   We are trying to monitor Cassandra using JMX. The monitoring tool we
 are using works fine for meters, However, if the metrcis are collected
 using gauge, the data type is object, then, our tool treat it as a string
 instead of a double. for example

 org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Capacity

 The Type of Attribute (Value) is java.lang.Object

 is it possible to implement the datatype of gauge as numeric types
 instead of object, or other way around for example using metric
 reporter...etc?

 Thanks a lot for any suggestion!

 Best Regard!
   Mike






 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Multi-column range scans

2014-07-14 Thread Ken Hancock

I don't think your query is doing what he wants.  Your query will correctly
set the starting point, but will also return larger interval_id's but with
lower skill_levels:

cqlsh:test select * from skill_count where skill='Complaints' and
(interval_id, skill_level) = (140235930, 5);

 skill  | interval_id   | skill_level | skill_count
+---+-+-
 Complaints | 140235930 |   5 |  20
 Complaints | 140235930 |   8 |  30
 Complaints | 140235930 |  10 |   1
 Complaints | 140235940 |   2 |  10
 Complaints | 140235940 |   8 |  30

(5 rows)

cqlsh:test select * from skill_count where skill='Complaints' and
(interval_id, skill_level) = (140235930, 5) and (interval_id) 
(140235990);

 skill  | interval_id   | skill_level | skill_count
+---+-+-
 Complaints | 140235930 |   5 |  20  - desired
 Complaints | 140235930 |   8 |  30  - desired
 Complaints | 140235930 |  10 |   1  - desired
 Complaints | 140235940 |   2 |  10  - SKIP
 Complaints | 140235940 |   8 |  30  - desired

The query results in a discontinuous range slice so isn't supported --
Essentially, the client will have to read the entire range and perform
client-side filtering.  Whether this is efficient depends on the
cardinality of skill_level.

I tried playing with the allow filtering cql clause, but it would appear
from the documentation it's very restrictive...





On Mon, Jul 14, 2014 at 7:44 AM, DuyHai Doan doanduy...@gmail.com wrote:

 or :


 select * from skill_count where skill='Complaints'
 and (interval_id,skill_level) = (140235930,5)
 and (interval_id)  (140235990)

 Strange enough, when starting using tuple notation you'll need to stick to
 it even if there is only one element in the tuple


 On Mon, Jul 14, 2014 at 1:40 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Sorry, I've just checked, the correct query should be:

 select * from skill_count where skill='Complaints' and
 (interval_id,skill_level) = (140235930,5) and
 (interval_id,skill_level)  (140235990,11)


 On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan doanduy...@gmail.com
 wrote:

 Hello Mathew

  Since Cassandra 2.0.6 it is possible to query over composites:
 https://issues.apache.org/jira/browse/CASSANDRA-4851

 For your example:

 select * from skill_count where skill='Complaints' and
 (interval_id,skill_level) = (140235930,5) and interval_id 
 140235990;


 On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen 
 matthew.j.al...@gmail.com wrote:

 Hi,

 We have a roll-up table that as follows.

 CREATE TABLE SKILL_COUNT (
   skill text,
   interval_id bigint,
   skill_level int,
   skill_count int,
   PRIMARY KEY (skill, interval_id, skill_level));

 Essentially,
   skill = a names skill i.e. Complaints
   interval_id = a rounded epoch time (15 minute intervals)
   skill_level = a number/rating from 1-10
   skill_count = the number of people with the specified skill, with the
 specified skill level, logged in at the interval_id

 We'd like to run the following query against it

 select * from skill_count where skill='Complaints' and interval_id =
 140235930 and interval_id  140235990 and skill_level = 5;

 to get a count of people with the relevant skill and level at the
 appropriate time.  However I am getting the following message.

 Bad Request: PRIMARY KEY part skill_level cannot be restricted
 (preceding part interval_id is either not restricted or by a non-EQ
 relation)

 Looking at how the data is stored ...

 ---
 RowKey: Complaints
 = (name=140235930:2:, value=, timestamp=1405308260403000)
 = (name=140235930:2:skill_count, value=000a,
 timestamp=1405308260403000)
 = (name=140235930:5:, value=, timestamp=1405308260403001)
 = (name=140235930:5:skill_count, value=0014,
 timestamp=1405308260403001)
 = (name=140235930:8:, value=, timestamp=1405308260419000)
 = (name=140235930:8:skill_count, value=001e,
 timestamp=1405308260419000)
 = (name=140235930:10:, value=, timestamp=1405308260419001)
 = (name=140235930:10:skill_count, value=0001,
 timestamp=1405308260419001)

 Should cassandra be able to allow for an extra level of filtering ? or
 is this something that should be performed from within the application.

 We have a solution working in Oracle, but would like to store this data
 in Cassandra, as all the other data that this solution relies on already
 sits within Cassandra.

 Appreciate any guidance on this matter.

 Matt







-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978

Re: Controlling system.log rotation

2014-07-07 Thread Ken Hancock

I think this essentially boils down the issue:

https://issues.apache.org/bugzilla/show_bug.cgi?id=40407

Seems the best way would be to change the umask for user cassandra:

http://stackoverflow.com/questions/7893511/permissions-on-log-files-created-by-log4j-rollingfileappender


Ken

On Mon, Jul 7, 2014 at 9:50 AM, Xavier Fustero xav...@rightscale.com
wrote:

 Hi,

 I used to have system.log writing directly to syslog and configure a
 rsyslog server to get all logs from my cassandra boxes. However, the java
 stack traces are a headache on my server and I read on rsyslog forums to
 change application to write to a file and let rsyslog to read from that
 file (using imfile module).

 This is what I have done. However, the /var/log/cassandra/system.log is
 created as cassandra:cassandra 600. I would like to change the groupship to
 syslog and have permissions like 640. I can do it but whenever the file is
 rotated it starts again as cassandra:cassandra 600.

 I can't find much information on that file but changing the rotation and
 the size. E.g.:

 log4j.appender.R.maxFileSize=50MB
 log4j.appender.R.maxBackupIndex=50

 Is there a way to control this?

 Thanks,
 Xavi

Re: Large SSTable not compacted with size tiered compaction

2014-07-07 Thread Ken Hancock

What are the timestamps on those SST Tables? Do those tables use TTL?

To answer your last question, I've seen that scenario happen under load
testing with column families with TTL. Large loads within the TTL window
cause normal compaction to build up larger and larger SST tables. When the
load falls off there's a couple very large tables and under normal load
small tables get TTL'd before any table gets large enough to hit
min_threshold, so data that's months old and should have been TTL'd will
never get the chance.

There's a nice blog that covers the bucket_high and the algorithm -- yes, I
believe setting bucket_high large enough will cause the one large table to
be grouped with the others -- however, if you're not using TTL, I don't
think there's an issue -- small tables simply need to build up to get
another three medium-sized tables (1.7 GB) which then need to build up to
get 4 larger (8GB tables).

http://shrikantbang.wordpress.com/2014/04/22/size-tiered-compaction-strategy-in-apache-cassandra/

On Mon, Jul 7, 2014 at 12:13 PM, John Sanda john.sa...@gmail.com wrote:

I have a write-heavy table that is using size tiered compaction. I am
running C* 1.2.9. There is an SSTable that is not getting compacted. It is
disproportionately larger than the other SSTables. The data file sizes are,

1.70 GB
0.18 GB
0.16 GB
0.05 GB
8.61 GB

If I set the bucket_high compaction property on the table to a
sufficiently large value, will the 8.61 GB get compacted? What if any
drawbacks are there to increasing the bucket_high property?

In what scenarios could I wind up with such a disproportionately large
SSTable like this? One thing that comes to mind is major compactions, but I
have not that.

- John

Re: nodetool repair -snapshot option?

2014-07-01 Thread Ken Hancock

I also expanded on a script originally written by Matt Stump @ Datastax.
The readme has the reasoning behind requiring sub-range repairs.

https://github.com/hancockks/cassandra_range_repair




On Mon, Jun 30, 2014 at 10:20 PM, Phil Burress philburress...@gmail.com
wrote:

 @Paulo, this is very cool! Thanks very much for the link!


 On Mon, Jun 30, 2014 at 9:37 PM, Paulo Ricardo Motta Gomes 
 paulo.mo...@chaordicsystems.com wrote:

 If you find it useful, I created a tool where you input the node IP,
 keyspace, column family, and optionally the number of partitions (default:
 32K), and it outputs the list of subranges for that node, CF, partition
 size: https://github.com/pauloricardomg/cassandra-list-subranges

 So you can basically iterate over the output of that and do subrange
 repair for each node and cf, maybe in parallel. :)


 On Mon, Jun 30, 2014 at 10:26 PM, Phil Burress philburress...@gmail.com
 wrote:

 One last question. Any tips on scripting a subrange repair?


 On Mon, Jun 30, 2014 at 7:12 PM, Phil Burress philburress...@gmail.com
 wrote:

 We are running repair -pr. We've tried subrange manually and that seems
 to work ok. I guess we'll go with that going forward. Thanks for all the
 info!


 On Mon, Jun 30, 2014 at 6:52 PM, Jaydeep Chovatia 
 chovatia.jayd...@gmail.com wrote:

 Are you running full repair or on subset? If you are running full
 repair then try running on sub-set of ranges which means less data to 
 worry
 during repair and that would help JAVA heap in general. You will have to 
 do
 multiple iterations to complete entire range but at-least it will work.

 -jaydeep


 On Mon, Jun 30, 2014 at 3:22 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Mon, Jun 30, 2014 at 3:08 PM, Yuki Morishita mor.y...@gmail.com
 wrote:

 Repair uses snapshot option by default since 2.0.2 (see NEWS.txt).


 As a general meta comment, the process by which operationally
 important defaults change in Cassandra seems ad-hoc and sub-optimal.

 For to record, my view was that this change, which makes repair even
 slower than it previously was, was probably overly optimistic.

 It's also weird in that it changes default behavior which has been
 unchanged since the start of Cassandra time and is therefore probably
 automated against. Why was it so critically important to switch to 
 snapshot
 repair that it needed to be shotgunned as a new default in 2.0.2?

 =Rob








 --
 *Paulo Motta*

 Chaordic | *Platform*
 *www.chaordic.com.br http://www.chaordic.com.br/*
 +55 48 3232.3200





-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: RPC timeout paging secondary index query results

2014-07-01 Thread Ken Hancock

You didn't post any timings, only when it started failing so it's unclear
whether performance is dropping off or scaling in some sort of linear or
non-linear fashion. Second the recommendation to do some traces which
should be much more telling.


On Fri, Jun 13, 2014 at 3:34 AM, Phil Luckhurst 
phil.luckhu...@powerassure.com wrote:

 But would you expect performance to drop off so quickly? At 250,000 records
 we can still page through the query with LIMIT 5 but when adding an
 additional 50,000 records we can't page past the first 10,000 records even
 if we drop to LIMIT 10.

 What about the case where we add 100,000 records for each indexed value?
 When we do this for 2 values, i.e. 200,000 records with 2 indexed values,
 we
 can query all 100,000 records for one of the values using LIMIT 10. If
 we add a third indexed value with another 100,000 records then we can't
 page
 through any of the indexed values even though the original 2 that worked
 previously have not changed.

 Phil



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RPC-timeout-paging-secondary-index-query-results-tp7595078p7595126.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.

Re: tracing thrift queries?

2014-07-01 Thread Ken Hancock

A couple options:

1. You could use nodetool settraceprobability.  If lots are slow, you can
then go to the system_events.sessions and pick out some slow ones.

2. You could update the metadata (temporarily) on your table and use
tracing ad-hoc using cqlsh.

Using #1 if you need to pick out exact queries will be incredibly painful
unless your cluster is silent.




On Sun, Jun 29, 2014 at 7:41 PM, Kevin Burton bur...@spinn3r.com wrote:

 Is it possible to trace thrift queries?

 We're using KairosDB and *most* of the queries are amazingly slow… even
 repeated queries which should be in cache.

 Tracing them should help iron down whats happening.

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Minimum Cluster size to accommodate a single node failure

2014-06-18 Thread Ken Hancock

Another nice resource...

http://www.ecyrd.com/cassandracalculator/

Re: Configuring all nodes as seeds

2014-06-18 Thread Ken Hancock

Amen. I believe the whole seed node/bootstrapping confusion goes against
the Why Cassandra, quoted from
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra

*Operational simplicity* – with all nodes in a cluster being the same,
there is no complex configuration to manage so administration duties are
greatly simplified.

This is one area that stands out from watching the mailing list that
clearly causes plenty of confusion.

On Wed, Jun 18, 2014 at 2:42 PM, Robert Coli rc...@eventbrite.com wrote:

On Wed, Jun 18, 2014 at 4:56 AM, Jonathan Lacefield
jlacefi...@datastax.com wrote:

What Artur is alluding to is that seed nodes do not bootstrap.
Replacing seed nodes requires a slightly different approach for node
replacement compared to non seed nodes. See here for more details:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html

Better, OP could comment on the below JIRA ticket emphasizing the need for
fewer than 3 alternative and logically ambiguous ways to do the common
operation of replacing a seed node.

Especially because the options say nonsensical things like :

Add the replacement seed node's IP to all the node's seed lists.
You do not need to restart the nodes.

If you do not restart the nodes, they do not know the replacement node is
a seed. This act seemingly cannot have meaning until the future time at
which all nodes are restarted. As detailed in the ticket, I assert that no
one understands what it actually means to be a seed, and why seeds should
or should not be able to bootstrap.

https://issues.apache.org/jira/browse/CASSANDRA-5836

=Rob

Re: Cassanda JVM GC defaults question

2014-04-24 Thread Ken Hancock

Wouldn't I want flush_largest_memtables_at larger than my
OccupyingFraction?  I want GC to kick in before I have to dump my memtables
not after.


On Wed, Apr 23, 2014 at 10:12 AM, Ruchir Jha ruchir@gmail.com wrote:

 Lowering CMSInitiatingOccupancyFraction to less than 0.75 will lead to
 more GC interference and will impact write performance. If you're not
 sensitive to this impact, your expectation is correct, however make
 sure your flush_largest_memtables_at is always set to less than or
 equal to the occupancy fraction.

 On 4/23/14, Ken Hancock ken.hanc...@schange.com wrote:
  I'm in the process of trying to tune the GC and I'm far from an expert in
  this area, so hoping someone can tell me I'm either out in left field or
  on-track.
 
  Cassandra's default GC settings are (abbreviated):
  +UseConcMarkSweepGC
  +CMSInitiaitingOccupancyFraction=75
  +UseCMSInitiatingOccupancyOnly
 
  Also in cassandra.yaml:
  flush_largest_memtables_at: 0.75
 
  Since the new heap is relatively small, if I'm understanding this
 correctly
  CMS will normally not kick in until it's at roughly 75% of the heap (75%
 of
  size-new, new being relatively small compared to the overall heap).
  These
  two settings being very close would seem that both trigger at nearly the
  same point which might be undesirable as the flushing would also create
  more GC pressure (in addition to FlushWriter blocking if multiple tables
  are queued for flushing because of this).
 
  Clearly more heap will give us more peak running room, but would also
  lowering the CMSInitiatingOccupancyFraction help at the expense of some
  added CPU for more frequent, smaller collections?
 
  Mikio Bruan's blog had some interesting tests in this area
  http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html
 




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Cassanda JVM GC defaults question

2014-04-23 Thread Ken Hancock

I'm in the process of trying to tune the GC and I'm far from an expert in
this area, so hoping someone can tell me I'm either out in left field or
on-track.

Cassandra's default GC settings are (abbreviated):
+UseConcMarkSweepGC
+CMSInitiaitingOccupancyFraction=75
+UseCMSInitiatingOccupancyOnly

Also in cassandra.yaml:
flush_largest_memtables_at: 0.75

Since the new heap is relatively small, if I'm understanding this correctly
CMS will normally not kick in until it's at roughly 75% of the heap (75% of
size-new, new being relatively small compared to the overall heap).  These
two settings being very close would seem that both trigger at nearly the
same point which might be undesirable as the flushing would also create
more GC pressure (in addition to FlushWriter blocking if multiple tables
are queued for flushing because of this).

Clearly more heap will give us more peak running room, but would also
lowering the CMSInitiatingOccupancyFraction help at the expense of some
added CPU for more frequent, smaller collections?

Mikio Bruan's blog had some interesting tests in this area
http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html

Re: Bootstrap Timing

2014-04-16 Thread Ken Hancock

Seed nodes don't bootstrap.

https://issues.apache.org/jira/browse/CASSANDRA-5836




On Wed, Apr 16, 2014 at 2:17 PM, Phil Burress philburress...@gmail.comwrote:

 Also, one more quick question. For the new nodes, do I add all three
 existing nodes as seeds? Or just add one?


 On Wed, Apr 16, 2014 at 2:16 PM, Phil Burress philburress...@gmail.comwrote:

 Thanks very much for the response. I'm not using vnodes, does that matter?


 On Wed, Apr 16, 2014 at 2:13 PM, Robert Coli rc...@eventbrite.comwrote:

 On Wed, Apr 16, 2014 at 11:10 AM, Phil Burress philburress...@gmail.com
  wrote:

 How long does bootstrapping typically take? I have 3 existing nodes in
 our cluster with about 40GB each. I've added three new nodes to the
 cluster. They have been in bootstrap mode for a little over 3 days now.
 Should I be concerned? Is there a way to tell how long it will take to
 finish?


 Adding more than one node at a time to a cluster (especially with
 vnodes) is Not Supported. If I were you, I would stop all 3 bootstraps and
 then do one at a time.

  =Rob







-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Replication Factor question

2014-04-15 Thread Ken Hancock

Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a
5-node cluster with RF=3, it would be impossible to lose 2 nodes without
affecting quorum for at least some of your data. In a 6 node cluster, once
you've lost one node, if you were to lose another, you only have a 1-in-5
chance of not affecting quorum for some of your data.

In much larger clusters, it becomes less probable that you will lose
multiple nodes within a RF group.





On Tue, Apr 15, 2014 at 4:37 AM, Markus Jais markus.j...@yahoo.de wrote:

 Hi all,

 thanks for your answers. Very helpful. We plan to use enough nodes so that
 the failure of 1 or 2 machines is no problem. E.g. for a workload to can be
 handled by 3 nodes all the time, we would use at least 5, better 6 nodes to
 survive the failure of at least 2 nodes, even when the 2 nodes fail at the
 same time. This should allow the cluster to rebuild the missing nodes and
 still serve all requests with a RF=3 and Quorum reads.

 All the best,

 Markus





   Tupshin Harper tups...@tupshin.com schrieb am 21:23 Montag, 14.April
 2014:

 tl;dr make sure you have enough capacity in the event of node failure. For
 light workloads, that can be fulfilled with nodes=rf.
 -Tupshin
 On Apr 14, 2014 2:35 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais markus.j...@yahoo.de wrote:

 It is generally not recommended to set a replication factor of 3 if you
 have fewer than six nodes in a data center.


 I have a detailed post about this somewhere in the archives of this list
 (which I can't seem to find right now..) but briefly, the 6-for-3 advice
 relates to the percentage of capacity you have remaining when you have a
 node down. It has become slightly less accurate over time because vnodes
 reduce bootstrap time and there have been other improvements to node
 startup time.

 If you have fewer than 6 nodes with RF=3, you lose 1/6th of capacity when
 you lose a single node, which is a significant percentage of total cluster
 capacity. You then lose another meaningful percentage of your capacity when
 your existing nodes participate in rebuilding the missing node. If you are
 then unlucky enough to lose another node, you are missing a very
 significant percentage of your cluster capacity and have to use a
 relatively small fraction of it to rebuild the now two down nodes.

 I wouldn't generalize the rule of thumb as don't run under N=RF*2, but
 rather as probably don't run RF=3 under about 6 nodes. IOW, in my view,
 the most operationally sane initial number of nodes for RF=3 is likely
 closer to 6 than 3.

 =Rob

Re: Intermittent long application pauses on nodes

2014-04-14 Thread Ken Hancock

My searching my list archives shows this thread evaporated.  Was a root
cause ever found?  Very curious.




On Mon, Feb 3, 2014 at 11:52 AM, Benedict Elliott Smith 
belliottsm...@datastax.com wrote:

 Hi Frank,

 The 9391 under RevokeBias is the number of milliseconds spent
 synchronising on the safepoint prior to the VM operation, i.e. the time it
 took to ensure all application threads were stopped. So this is the
 culprit. Notice that the time spent spinning/blocking for the threads we
 are supposed to be waiting on is very low; it looks to me that this is time
 spent waiting for CMS threads to yield, though it is very hard to say with
 absolute certainty. It doesn't look like the issue is actually the
 RevokeBias itself, anyway.

 I think we should take this off list. It definitely warrants a ticket,
 though I expect this will be difficult to pin down, so you will have to be
 willing to experiment a bit with us, but we would be very grateful for the
 help. If you can pin down and share a specific workload that triggers this
 we may be able to do it without you though!

 It's possible that this is a JVM issue, but if so there may be some
 remedial action we can take anyway. There are some more flags we should
 add, but we can discuss that once you open a ticket. If you could include
 the strange JMX error as well, that might be helpful.

 Thanks,

 Benedict


 On 3 February 2014 15:34, Frank Ng fnt...@gmail.com wrote:

 I was able to send SafePointStatistics to another log file via the
 additional JVM flags and recently noticed a pause of 9.3936600 seconds.
 Here are the log entries:

 GC Log file:
 ---
 2014-01-31T07:49:14.755-0500: 137460.842: Total time for which
 application threads were stopped: 0.1095540 seconds
 2014-01-31T07:51:01.870-0500: 137567.957: Total time for which
 application threads were stopped: 9.3936600 seconds
 2014-01-31T07:51:02.537-0500: 137568.623: Total time for which
 application threads were stopped: 0.1207440 seconds

 JVM Stdout Log File:
 ---
  vmop [threads: total initially_running
 wait_to_block][time: spin block sync cleanup vmop] page_trap_count
 137460.734: GenCollectForAllocation  [ 421
 00   ][ 0 0 23  0 84 ]  0
  vmop [threads: total initially_running
 wait_to_block][time: spin block sync cleanup vmop] page_trap_count
 137558.562: RevokeBias   [ 462
 29   ][13 0   9391  1  0 ]  0
 writer thread='47436187662656'/
 dependency_failed type='leaf_type'
 ctxk='javax/management/ObjectName$Property'
 witness='javax/management/ObjectName$PatternProperty' stamp='137568.503'/
 writer thread='47436033530176'/
  vmop [threads: total initially_running
 wait_to_block][time: spin block sync cleanup vmop] page_trap_count
 137568.500: Deoptimize   [ 481
 15   ][ 0 0118  0  1 ]  0
  vmop [threads: total initially_running
 wait_to_block][time: spin block sync cleanup vmop] page_trap_count
 137569.625: no vm operation  [ 483
 01   ][ 0 0 18  0  0 ]  0
  vmop [threads: total initially_running
 wait_to_block][time: spin block sync cleanup vmop] page_trap_count
 137571.641: no vm operation  [ 483
 01   ][ 0 0 42  1  0 ]  0
  vmop [threads: total initially_running
 wait_to_block][time: spin block sync cleanup vmop] page_trap_count
 137575.703: no vm operation  [ 483
 01   ][ 0 0 25  1  0 ]  0

 If SafepointStatistics are printed before the Application Stop times,
 then it seems that the RevokeBias was the cause of the pause.
 If SafepointStatistics are printed after the Application Stop times, then
 it seems that the Deoptimize was the cause of the pause.
 In addition, I see a strange dependency failed error relating to JMX in
 the JVM stdout log file.

 thanks


 On Wed, Jan 29, 2014 at 4:44 PM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

 Add some more flags: -XX:+UnlockDiagnosticVMOptions -XX:LogFile=${path}
 -XX:+LogVMOutput

 I never figured out what kills stdout for C*. It's a library we depend
 on, didn't try too hard to figure out which one.


 On 29 January 2014 21:07, Frank Ng fnt...@gmail.com wrote:

 Benedict,
 Thanks for the advice.  I've tried turning on
 PrintSafepointStatistics.  However, that info is only sent to the STDOUT
 console.  The cassandra startup script closes the STDOUT when it finishes,
 so nothing is shown for safepoint statistics once it's done starting up.
 Do you know how to startup cassandra and send all stdout to a log file and
 tell cassandra not to close stdout?

 Also, we

Re: Update SSTable fragmentation

2014-04-09 Thread Ken Hancock

I don't believe so.  Cassandra still needs to hit the bloom filters for
each SST table and then reconcile all versions and all tombstones for any
row.  That's why overwrites have similar performance impact as tombstones,
overwrites just happen to be less common.



On Wed, Apr 9, 2014 at 2:42 PM, Wayne Schroeder 
wschroe...@pinsightmedia.com wrote:

 I've been doing a lot of reading on SSTable fragmentation due to updates
 and the costs associated with reconstructing the end data from multiple
 SSTables that have been created over time and not yet compacted.  One
 question is stuck in my head: If you re-insert entire rows instead of
 updating one column, will cassandra end flushing that entire row into one
 SSTable on disk and then end up up finding a non fragmented entire row
 quickly on reads instead of potential reconstruction across multiple
 SSTables?  Obviously this has implications for space as a trade off.

 Wayne




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Installing Datastax Cassandra 1.2.15 Using Yum (Java Issue)

2014-03-27 Thread Ken Hancock

On Thu, Mar 27, 2014 at 4:53 PM, Jon Forrest jon.forr...@xoom.com wrote:

 It would be great to know the origin of this issue.


See http://www.rudder-project.org/redmine/issues/2941 for the mess that has
been created regarding java JRE dependencies.

Ken

Re: Cassandra DSC 2.0.5 not starting - * could not access pidfile for Cassandra

2014-03-11 Thread Ken Hancock

I was always kind of annoyed that the datastax RPMs made me force-install
other java distros to satisfy RPM dependencies.  Then I did some research
and while I'm still annoyed, at least I'm sympathetic.

See http://www.rudder-project.org/redmine/issues/2941 for the mess that has
been created regarding java JRE dependencies.




On Mon, Mar 10, 2014 at 7:03 PM, Michael Shuler mich...@pbandjelly.orgwrote:

 On 03/10/2014 05:15 PM, Michael Shuler wrote:

 I did find a bug with the same behavior you described. I have no idea
 why someone *removed* the dependency on a functional JRE from the
 cassandra package - this is *not* the same Depends: line as the
 upstream OSS cassandra package


 Quick follow-up on why this package deviates from the upstream deb
 package: this is done to prevent users from being force-installed the
 OpenJDK packages, since most users install the Oracle JRE from tar [0].
 This installation of OpenJDK via package steps on the hand-installed java
 alternatives symlinks. I had assumed, incorrectly, that the cassandra
 package in the DSC repository was the identical package as in the Apache
 repository.

 [0] http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/
 installJreDeb.html

 --
 Kind regards,
 Michael




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Weird timeouts

2014-03-07 Thread Ken Hancock

Are you on Cassandra 1.2 and can utilize the trace functionality?  Might be
an informative route.

Ken



On Fri, Mar 7, 2014 at 9:22 AM, Joel Samuelsson
samuelsson.j...@gmail.comwrote:

 I try to fetch all the row keys from a column family (there should only be
 a couple of hundred in that CF) in several different ways but I get
 timeouts whichever way I try:

 Through the cassandra cli:
 Fetching 45 rows is fine:
 list cf limit 46 columns 0;
 .
 .
 .
 45 Rows Returned.
 Elapsed time: 298 msec(s).

 Fetching 46 rows however gives me a timeout after a minute or so:
 list cf limit 46 columns 0;
 null
 TimedOutException()...

 Through pycassa:
 keys = cf.get_range(column_count = 1, buffer_size = 2)

 for key, val in keys:
  print key

 This prints some keys and then gets stuck at the same place each time and
 then timeouts.

 The columns (column names + value) in the rows should be less than 100
 bytes each, though there may be a lot of them on a particular row.

 To me it seems like one of the rows take too long time to fetch but I
 don't know why since I am limitiing the number of columns to 0. Without
 seeing the row, I have a hard time knowing what could be wrong. Do you have
 any ideas?





-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Compaction does not remove tombstones if column has higher TTL

2014-03-07 Thread Ken Hancock

I agree, that's totally unintuitive.  I would have the same expectations
that compaction is done on a row/column pair instead of simply at the row
level.


On Fri, Feb 28, 2014 at 11:44 AM, Keith Wright kwri...@nanigans.com wrote:

 FYI - I recently filed
 https://issues.apache.org/jira/browse/CASSANDRA-6654 and wanted to let
 everyone know the result as it was not what I expected.   I am using C*
 1.2.12 and found that my droppable tombstone ratio kept increasing on an
 LCS table (currently  .3).  Documentation states that compactions should
 be triggered when that gets above .2 to help cleanup tombstones and in my
 case compactions are definitely not running behind.

 I am setting different TTLs on different columns (this capability was one
 of the things I love about Cassandra) and the result of the ticket is that
 only columns whose write time is less than now + the MAX TTL of columns
 within the row will NOT be removed.  In my case, I was setting some columns
 to 6 months and others to 7 days so this meant that the 7 day data will in
 fact NOT be removed until 6 months!  This results in MUCH wider rows than I
 expected.

 It appears that this was likely fixed in 2.1 but obviously people will not
 be deploying that to production anytime soon.  It appears that I will just
 have to no longer set the 6 month TTL and instead leave it as forever to
 ensure that the smaller TTLs are respected.  This is an acceptable tradeoff
 for me since the 7 day columns are the ones that get much larger (against a
 map column type).

 So be warned, mixing TTLs in a row does not appear to result in the data
 being compacted away.

 Thanks




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Cassandra-cli and Composite columns

2014-02-25 Thread Ken Hancock

I've been trying to do some simple data modeling and since we're currently
using Hector have been doing that modeling with cassandra-cli and running
into issues with CompositeType columns.

If I do a help set, I see:

The help for create column family shows:

create column family UseComposites
   with comparator = 'CompositeType(UTF8Type, Int32Type)'
   and caching='ALL';

The help for set shows:

set UseComposites[utf8('testkey')]['CompositeType(utf8(first),int(4))'] =
utf8('inserts this string into a column with name first:4');

This doesn't seem to be correct since since it's casting the entire value
into a utf8 as shown by this:

set UseComposites[utf8('testkey')]['CompositeType(utf8(first),int(4))'] =
utf8('inserts this string into a column with name first:4');
set UseComposites[utf8('testkey')]['CompositeType(utf8(first),int(34))'] =
utf8('inserts this string into a column with name first:4');

list UseComposites;

RowKey: 746573746b6579
= (name=CompositeType(utf8(first),int(34)), value=inserts this string into
a column with name first:4, timestamp=1393337249636000)
= (name=CompositeType(utf8(first),int(4)), value=inserts this string into
a column with name first:4, timestamp=1393337170861000)

I know I can use the set[foo][composite1:composite2] notation and this
appears to work for ascii or int types.

I'm trying to model with DateType and LongType and can't seem to coerce
anything into those components.

set long3[key2][long(1):long(1):long(1)] = utf8('1,1,1');
Syntax error at position 23: mismatched input ':' expecting ']'

Composites have been around for a long time...is this not supported or did
it get broken along the way or is the documentation just out of date?

Thanks!

Ken

Re: Opscenter tabs

2014-01-23 Thread Ken Hancock

Multiple DCs are still a single cluster in OpsCenter.  If you go to
Physical View, you should see one column for each data center.

Also, the Community edition of OpsCenter, last I saw, only supported a
single cluster.



On Thu, Jan 23, 2014 at 12:06 PM, Daniel Curry daniel.cu...@arrayent.comwrote:

   I am unable to find any references on if the tabs to monitor multiple DC
 can be configure to read the DC location. I do not want to change the
 cluster name itself.  Right now I see three tabs all with the same names
 cluster_name:  test.  Like to keep the current cluster name test, but
 change the opscenter tabs to DC1, DC2, and DC3.

Is this documented somewhere?

 --
 Daniel Curry
 Sr Linux Systems Administrator
 Arrayent, Inc.
 2317 Broadway Street, Suite 20
 Redwood City, CA 94063
 dan...@arrayent.com




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: MUTATION messages dropped

2013-12-20 Thread Ken Hancock

I ended up changing memtable_flush_queue_size to be large enough to contain
the biggest flood I saw.

I monitored tpstats over time using a collection script and an analysis
script that I wrote to figure out what my largest peaks were.  In my case,
all my mutation drops correlated with hitting the maximum
memtable_flush_queue_size and then mutations drops stopped as soon as the
queue size dropped below the max.

I threw the scripts up on github in case they're useful...

https://github.com/hancockks/tpstats




On Fri, Dec 20, 2013 at 1:08 AM, Alexander Shutyaev shuty...@gmail.comwrote:

 Thanks for you answers.

 *srmore*,

 We are using v2.0.0. As for GC I guess it does not correlate in our case,
 because we had cassandra running 9 days under production load and no
 dropped messages and I guess that during this time there were a lot of GCs.

 *Ken*,

 I've checked the values you indicated. Here they are:

 node1 6498
 node2 6476
 node3 6642

 I guess this is not good :) What can we do to fix this problem?


 2013/12/19 Ken Hancock ken.hanc...@schange.com

 We had issues where the number of CF families that were being flushed
 would align and then block writes for a very brief period. If that happened
 when a bunch of writes came in, we'd see a spike in Mutation drops.

 Check nodetool tpstats for FlushWriter all time blocked.


 On Thu, Dec 19, 2013 at 7:12 AM, Alexander Shutyaev 
 shuty...@gmail.comwrote:

 Hi all!

 We've had a problem with cassandra recently. We had 2 one-minute periods
 when we got a lot of timeouts on the client side (the only timeouts during
 9 days we are using cassandra in production). In the logs we've found
 corresponding messages saying something about MUTATION messages dropped.

 Now, the official faq [1] says that this is an indicator that the load
 is too high. We've checked our monitoring and found out that 1-minute
 average cpu load had a local peak at the time of the problem, but it was
 like 0.8 against 0.2 usual which I guess is nothing for a 2 core virtual
 machine. We've also checked java threads - there was no peak there and
 their count was reasonable ~240-250.

 Can anyone give us a hint - what should we monitor to see this high
 load and what should we tune to make it acceptable?

 Thanks in advance,
 Alexander

 [1] http://wiki.apache.org/cassandra/FAQ#dropped_messages




 --
  *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | 
 NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
 LinkedIn] http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
  http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.





-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Rowcache and quorum reads cassandra

2013-10-10 Thread Ken Hancock

If you're hitting 3/5 nodes, it sounds like you've set your replication
factor to 5. Is that what you're doing so you can have a 2-node outtage?

For a 5-node cluster, RF=5, each node will have 100% of your data (a second
DC is just a clone), so with a 3GB off-heap it means that 3GB / total data
size in GB total would be cacheable in the row cache.

On the other hand, if you're doing RF=3, each node will have 60% of your
data instead of 100% so the effective percentage of rows that are cache
goes up by 66%.

Great quick  dirty caclulator: http://www.ecyrd.com/cassandracalculator/



On Thu, Oct 10, 2013 at 6:40 AM, Artur Kronenberg 
artur.kronenb...@openmarket.com wrote:

  I was reading through configuration tips for cassandra and decided to
 use row-cache in order to optimize the read performance on my cluster.

 I have a cluster of 10 nodes, each of them opeartion with 3 GB off-heap
 using cassandra 2.4.1. I am doing local quorum reads, which means that I
 will hit 3 nodes out of 5 because I split my 10 nodes into two data-centres.

 I was under the impression that since each node gets a certain range of
 reads my total amount of off-heap would be 10 * 3 GB = 30 GB. However is
 this still correct with quorum reads? How does cassandra handle row-cache
 hits in combination with quorum reads?

 Thanks!
 -- artur




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Rowcache and quorum reads cassandra

2013-10-10 Thread Ken Hancock

Reads still need to satisfy quorum when you've specified quorum --
otherwise you have no consistency control.

Each read goes out to each node that has a replica of key (in your case
all) and then independently each node consults its row cache and either
returns cached data or has to go through the normal key cache and SST
tables to return the data.

Once the coordinator node has received quorum matching results, the read
returns.

To answer your question, I think you either need a larger cache (if
performance actually shows your response times as insufficient).  For
performance, you should determine if you really need quorum or whether you
could do CF=ONE.

Someone else more familiar with the row cache implementation may want to
override here, but I'd conjecture that with a 5-node cluster, RF=5, that
essentially the row caches should have the same keys across all nodes
(ignoring things like cluster restarts, dropped mutations, etc.).





On Thu, Oct 10, 2013 at 10:03 AM, Artur Kronenberg 
artur.kronenb...@openmarket.com wrote:

  Hi.

 That is basically our set up. We'll be holding all data on all nodes.

 My problem was more on how the cache would behave. I thought it might go
 this way:

 1. No cache hit

 Read from 3 nodes to verify results are correct and then return. Write
 result into RowCache.

 2. Cache hit

 Read from Cache directly and return.

 If now the value gets updated it would be found in the RowCache and either
 invalidated (hence case 1 on next read) or updated (hence case 2 on next
 read). However I couldn't find any information on this.

 If this was the case it would mean that each node would only have to hold
 1/5 of my data in Cache (you're right about the DC clone so 1/5 of data
 instead of 1/10). If however 3 nodes have to be read each time and all 3
 fill up the row cache with the same data that would make my cache
 requirements bigger.

 Thanks!

 Artur

 On 10/10/13 14:06, Ken Hancock wrote:

  If you're hitting 3/5 nodes, it sounds like you've set your replication
 factor to 5. Is that what you're doing so you can have a 2-node outtage?

  For a 5-node cluster, RF=5, each node will have 100% of your data (a
 second DC is just a clone), so with a 3GB off-heap it means that 3GB /
 total data size in GB total would be cacheable in the row cache.

 On the other hand, if you're doing RF=3, each node will have 60% of your
 data instead of 100% so the effective percentage of rows that are cache
 goes up by 66%.

  Great quick  dirty caclulator: http://www.ecyrd.com/cassandracalculator/



 On Thu, Oct 10, 2013 at 6:40 AM, Artur Kronenberg 
 artur.kronenb...@openmarket.com wrote:

  I was reading through configuration tips for cassandra and decided to
 use row-cache in order to optimize the read performance on my cluster.

 I have a cluster of 10 nodes, each of them opeartion with 3 GB off-heap
 using cassandra 2.4.1. I am doing local quorum reads, which means that I
 will hit 3 nodes out of 5 because I split my 10 nodes into two data-centres.

 I was under the impression that since each node gets a certain range of
 reads my total amount of off-heap would be 10 * 3 GB = 30 GB. However is
 this still correct with quorum reads? How does cassandra handle row-cache
 hits in combination with quorum reads?

 Thanks!
 -- artur




 --
   *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | 
 NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
 LinkedIn] http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
  http://www.schange.com/  This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.





-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Among Datastax community Cassandra debian package, which to choose for production install ?

2013-09-30 Thread Ken Hancock

OpsCenter should be a separate package as you would only install it on a
single node, not necessarily even one that is running Cassandra.




On Sat, Sep 28, 2013 at 2:12 PM, Ertio Lew ertio...@gmail.com wrote:

 I think both provide the same thing except Datastax Community also
 provides some extras like Opscenter, etc. But I cannot find opscenter
 installed when I installled DSC on ubuntu. Although on windows
 installation, I saw opscenter  JRE as well , so I think for DSC, there is
 no such prerequisite for Oracle JRE as required for Cassandra debain
 package, is it so ?

 Btw which is usually preferred for production installs ?

 I may need to use Opscenter but just *occasionally*.




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Why Solandra stores Solr data in Cassandra ? Isn't solr complete solution ?

2013-09-30 Thread Ken Hancock

To clarify, solr indexes are not distributed in the same way that Cassandra
data is stored.

With Cassandra, each node receives a fraction of the keyspace (based on
your replication factor and token assignment).  With DSE Search, writes to
Cassandra are hooked and each node independently indexes its data and keeps
this index on the local file system.  If you have keyspace with RF=3 then
three nodes will index each document. Indexes, unlike from Solr, only store
the docids and the actual field values are stored in Cassandra.

When it comes to search, DSE splits up the search so in the example above
only one of those RF=3 nodes will be queried for a particular token range
so that data can be unioned across all the nodes with different token
ranges.

Not sure about Solandra, but you do need to be aware that there's a number
of Solr search options that are not supported on distribute searches/DSE
Search.

http://wiki.apache.org/solr/DistributedSearch
http://wiki.apache.org/solr/FieldCollapsing

Also, be aware that while Cassandra has knobs to allow you to get
consistent read results (CL=QUORUM), DSE Search does not. If a node drops
messages for whatever reason, outtage, mutation, etc. its solr indexes will
be inconsistent with other nodes in its replication group.



On Mon, Sep 30, 2013 at 1:06 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Sep 30, 2013 at 8:50 AM, Ertio Lew ertio...@gmail.com wrote:

 Solr's data is stored on the file system as a set of index files[
 http://stackoverflow.com/a/7685579/530153]. Then why do we need anything
 like Solandra or DataStax Enterprise Search? Isn't Solr complete solution
 in itself ?  What do we need to integrate with Cassandra ?


 Solr's index sitting on a single machine, even if that single machine can
 vertically scale, is a single point of failure.

 The value add of DES is that the index has the same availability
 characteristics as the underlying data, because it is stored in the same
 cluster.

 =Rob





-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Why Solandra stores Solr data in Cassandra ? Isn't solr complete solution ?

2013-09-30 Thread Ken Hancock

Yes.


On Mon, Sep 30, 2013 at 1:57 PM, Andrey Ilinykh ailin...@gmail.com wrote:


 Also, be aware that while Cassandra has knobs to allow you to get
 consistent read results (CL=QUORUM), DSE Search does not. If a node drops
 messages for whatever reason, outtage, mutation, etc. its solr indexes will
 be inconsistent with other nodes in its replication group.

 Will repair fix it?




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Memtable flush blocking writes

2013-09-24 Thread Ken Hancock

This is on Cassandra 1.2.9 though packaged into DSE which I suspect may
come into play here.  I didn't really get to the bottom of it other than to
up the queue to 32 which is about the number of CFs I have.  After that,
mutation drops disappeared and the FlushWriter blocks went away.


On Mon, Sep 23, 2013 at 6:03 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Aug 23, 2013 at 10:35 AM, Ken Hancock ken.hanc...@schange.comwrote:

 I appear to have a problem illustrated by
 https://issues.apache.org/jira/browse/CASSANDRA-1955. At low data
 rates, I'm seeing mutation messages dropped because writers are
 blocked as I get a storm of memtables being flushed. OpsCenter
 memtables seem to also contribute to this:

 ...

 Now I can increase memtable_flush_queue_size, but it seems based on
 the above that in order to solve the problem, I need to set this to
 count(CF). What's the downside of this approach? It seems a backwards
 solution to the real problem...


 What version of Cassandra? Did you ever get to the bottom of this?

 =Rob




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: cassandra just gone..no heap dump, no log info

2013-09-18 Thread Ken Hancock

We ran into this while tuning heap sizes.  With Cassandra 1.2 making use of
off-heap memory, if we made our JVM too large relative to the server
memory, the system would just bail.  We found for our app that the limit of
the JVM size relative to server memory was about 50%.


On Wed, Sep 18, 2013 at 8:57 AM, Juan Manuel Formoso jform...@gmail.comwrote:

 This shouldn't happen if you have swap active in the server

 On Wednesday, September 18, 2013, Franc Carter wrote:


 A random guess - possibly an OOM (Out of Memory) where Linux will kill a
 process to recover memory when it is desperately low on memory. Have a look
 in either your syslog output of the output of dmesg

 cheers


 On Wed, Sep 18, 2013 at 10:21 PM, Hiller, Dean dean.hil...@nrel.govwrote:

 Anyone know how to debug cassandra processes just exiting?  There is no
 info in the cassandra logs and there is no heap dump file(which in the past
 has shown up in /opt/cassandra/bin directory for me).

 This occurs when running a map/reduce job that put severe load on the
 system.  The logs look completely fine.  I find it odd

  1.  No logs of why it exited at all
  2.  No heap dump which would imply there would be no logs as it crashed

 Is there any other way a process can die and linux would log it somehow?
  (like running out of memory)

 Thanks,
 Dean




 --

 *Franc Carter* | Systems architect | Sirca Ltd

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




 --
 *Juan Manuel Formoso
 *Senior Geek
 http://twitter.com/juanformoso
 http://seniorgeek.com.ar
 LLAP




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Flush writer all time blocked

2013-08-29 Thread Ken Hancock

On Thu, Aug 29, 2013 at 1:57 PM, Robert Coli rc...@eventbrite.com wrote:
 On Thu, Aug 29, 2013 at 10:49 AM, S C as...@outlook.com wrote:
 I see a high count All time blocked for Flush Writer on nodetool tpstats.

 Is it how many blocked ever since the server was online? Can somebody 
 explain me what it is? I really appreciate it.


 Yes.

 Flush Writer thread pool is the thread pool responsible for the part of 
 memtable flush that actually writes to disk.
 If you see it with a non-zero blocked number, you have at some time written 
 to memory significantly faster than you
 could flush to disk.

I don't think this is strictly true?  There's also the periodic flush
that can cause a storm of flushes if you have multiple column
families.  I sent out a query to the list last week on this topic but
didn't get any responses -- I'm very interested in this topic as I've
had to set my queue size fairly large to avoid this issue.

Ken

Re: Flush writer all time blocked

2013-08-29 Thread Ken Hancock

# the number of full memtables to allow pending flush, that is,
# waiting for a writer thread. At a minimum, this should be set to
# the maximum number of secondary indexes created on a single CF.
memtable_flush_queue_size: 4

There was an interesting thread a while back:

http://mail-archives.apache.org/mod_mbox/cassandra-user/201307.mbox/%3c17c39fe466076c46b6e83f129c7b19ce2e7ec...@hkxprd0310mb352.apcprd03.prod.outlook.com%3E

On Thu, Aug 29, 2013 at 2:23 PM, S C as...@outlook.com wrote:
Ken,

What queue size are you referring to?

Thanks,
SC

From: ken.hanc...@schange.com
Date: Thu, 29 Aug 2013 14:21:04 -0400
Subject: Re: Flush writer all time blocked
To: user@cassandra.apache.org

On Thu, Aug 29, 2013 at 1:57 PM, Robert Coli rc...@eventbrite.com wrote:
On Thu, Aug 29, 2013 at 10:49 AM, S C as...@outlook.com wrote:
I see a high count All time blocked for Flush Writer on nodetool
tpstats.

Is it how many blocked ever since the server was online? Can somebody
explain me what it is? I really appreciate it.

Yes.

Flush Writer thread pool is the thread pool responsible for the part of
memtable flush that actually writes to disk.
If you see it with a non-zero blocked number, you have at some time
written to memory significantly faster than you
could flush to disk.

I don't think this is strictly true? There's also the periodic flush
that can cause a storm of flushes if you have multiple column
families. I sent out a query to the list last week on this topic but
didn't get any responses -- I'm very interested in this topic as I've
had to set my queue size fairly large to avoid this issue.

Ken

This e-mail and any attachments may contain information which is
SeaChange International confidential. The information enclosed is
intended only for the addressees herein and may not be copied or
forwarded without permission from SeaChange International.

Memtable flush blocking writes

2013-08-23 Thread Ken Hancock

I appear to have a problem illustrated by
https://issues.apache.org/jira/browse/CASSANDRA-1955. At low data
rates, I'm seeing mutation messages dropped because writers are
blocked as I get a storm of memtables being flushed. OpsCenter
memtables seem to also contribute to this:

INFO [OptionalTasks:1] 2013-08-23 01:53:58,522 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-runratecountforiczone@1281182121(14976/120803 serialized/live
bytes, 360 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,523 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-runratecountforchannel@705923070(278200/1048576
serialized/live bytes, 6832 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,525 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-solr_resources@1615459594(66362/66362 serialized/live bytes,
4 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,525 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-scheduleddaychannelie@393647337(33203968/36700160
serialized/live bytes, 865620 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,530 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-failediecountfornetwork@1781160199(8680/124903
serialized/live bytes, 273 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,530 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-rollups7200@37425413(6504/23 serialized/live bytes, 271
ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,531 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-rollups60@1943691367(638176/1048576 serialized/live bytes,
39894 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,531 ColumnFamilyStore.java
(line 630) Enqueuing flush of Memtable-events@99567005(1133/1133
serialized/live bytes, 39 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,532 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-rollups300@532892022(184296/1048576 serialized/live bytes,
7679 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,532 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-ie@1309405764(457390051/152043520 serialized/live bytes,
16956160 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,823 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-videoexpectedformat@1530999508(684/24557 serialized/live
bytes, 12453 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,929 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-failediecountforzone@411870848(9200/95294 serialized/live
bytes, 284 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:59,012 ColumnFamilyStore.java
(line 630) Enqueuing flush of Memtable-rollups86400@744253892(456/456
serialized/live bytes, 19 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:59,364 ColumnFamilyStore.java
(line 630) Enqueuing flush of Memtable-peers@2024878954(2006/40629
serialized/live bytes, 452 ops)

I had a tpstats running across all the nodes in my cluster every 5
seconds or so and observe the following:

2013-08-23T01:53:47 192.168.131.227 FlushWriter 0 0 33 0 0
2013-08-23T01:53:55 192.168.131.227 FlushWriter 0 0 33 0 0
2013-08-23T01:54:00 192.168.131.227 FlushWriter 2 10 37 1 5
2013-08-23T01:54:07 192.168.131.227 FlushWriter 1 1 53 0 11
2013-08-23T01:54:12 192.168.131.227 FlushWriter 1 1 53 0 11

Now I can increase memtable_flush_queue_size, but it seems based on
the above that in order to solve the problem, I need to set this to
count(CF). What's the downside of this approach? It seems a backwards
solution to the real problem...

64 matches

Mail list logo