Re: TWCS High Disk Space Usage Troubleshooting - Looking for suggestions

2017-08-08 Thread Jeff Jirsa
The most likely cause is read repairs due to consistency level repairs
(digest mismatch). The only way to actually eliminate read repair is to
read with CL:ONE, which almost nobody does (at least in time series use
cases, because it implies you probably write with ALL, or run repair which
- as you've noted - often isn't necessary in ttl-only use cases).

I can't see the image, but more tools for understanding sstable state are
never a bad thing (as long as they're generally useful and maintainable).

For what it's worth, there are tickets in flight for being more aggressive
at dropping overlaps, but there are companies that use tools that stop the
cluster, use sstablemetadata to identify sstables we knew should be fully
expired, and manually remove them (/bin/rm) before starting cassandra
again. It works reasonably well IF (and only if) you write all data with
TTLs, and you can identify fully expired sstables based on maximum
timestamps.




On Tue, Aug 8, 2017 at 8:51 PM, Sumanth Pasupuleti <
sumanth.pasupuleti...@gmail.com> wrote:

> Hi,
>>
>> We use TWCS in a few of the column families that have TTL based
>> time-series data, and no explicit deletes are issued. Over the time, we
>> observed the disk usage has been increasing beyond the expected levels.
>>
>> Data directory in a particular node shows SSTables that are more than
>> 16days old, while the bucket size is configured at 12hours, TTL is at
>> 15days and GC grace at 1hour.
>> Upon using sstableexpiredblockers, we got quite a few sets of blocking
>> and blocked SSTables. SSTableMetadata that is shown in the output indicates
>> there is an overlap in the MinTS-MaxTS period among the blocking SSTable
>> and the blocked SSTables, which is preventing the older SSTables from
>> getting dropped/deleted.
>>
>> Following are the possible root causes we considered
>>
>>1. Hints - old data hints getting replayed from the coordinator node.
>>We ruled this out since hints live for no more than 1 day based on our
>>configuration.
>>2. External compactions - no external compactions were run, that
>>could cause compaction of SSTables across the TWCS buckets.
>>3.  Read repairs - this is ruled out as well, since we never ran
>>external repairs, and read repair chance on the TWCS column families has
>>been set to 0.
>>4.  Application team writing data with older timestamp (in newer
>>SSTables).
>>
>>
>>1. We wanted to identify the specific row keys with older timestamps
>>   in the blocking SSTable, that could be causing this issue to occur. We
>>   considered using SSTable2Keys/json, however, since both the tools 
>> involve
>>   outputting the entire content/keys of the SSTable in the order of the 
>> keys,
>>   they were not helpful in this case.
>>   2. Since we wanted to get data on a few oldest cells with
>>   timestamps, we created a tool mostly based off of sstable2json, called
>>   sstableslicer, to output 'n' top/bottom cells in an SSTable, ordered 
>> either
>>   on writetime/localDeletionTime. This helped us identify the specific 
>> cells
>>   in new SSTables with older timestamps, which further helped in 
>> debugging on
>>   the application end. From application team perspective, however, 
>> writing
>>   data with old timestamp is not a possible scenario.
>>
>>3. Below is a sample output of sstableslicer
> [image: Inline image 2]
>
>
>> Looking for suggestions, especially around following two things:
>>
>>1. Did we miss any other case in TWCS that could be causing such
>>overlap?
>>2. Does sstableslicer seem valuable, to be included in Apache C*? If
>>yes, I shall create a JIRA and submit a PR/patch for review.
>>
>> C* version we use is 2.1.17.
>
> Thanks,
>> Sumanth
>>
>


TWCS High Disk Space Usage Troubleshooting - Looking for suggestions

2017-08-08 Thread Sumanth Pasupuleti
>
> Hi,
>
> We use TWCS in a few of the column families that have TTL based
> time-series data, and no explicit deletes are issued. Over the time, we
> observed the disk usage has been increasing beyond the expected levels.
>
> Data directory in a particular node shows SSTables that are more than
> 16days old, while the bucket size is configured at 12hours, TTL is at
> 15days and GC grace at 1hour.
> Upon using sstableexpiredblockers, we got quite a few sets of blocking and
> blocked SSTables. SSTableMetadata that is shown in the output indicates
> there is an overlap in the MinTS-MaxTS period among the blocking SSTable
> and the blocked SSTables, which is preventing the older SSTables from
> getting dropped/deleted.
>
> Following are the possible root causes we considered
>
>1. Hints - old data hints getting replayed from the coordinator node.
>We ruled this out since hints live for no more than 1 day based on our
>configuration.
>2. External compactions - no external compactions were run, that could
>cause compaction of SSTables across the TWCS buckets.
>3.  Read repairs - this is ruled out as well, since we never ran
>external repairs, and read repair chance on the TWCS column families has
>been set to 0.
>4.  Application team writing data with older timestamp (in newer
>SSTables).
>
>
>1. We wanted to identify the specific row keys with older timestamps
>   in the blocking SSTable, that could be causing this issue to occur. We
>   considered using SSTable2Keys/json, however, since both the tools 
> involve
>   outputting the entire content/keys of the SSTable in the order of the 
> keys,
>   they were not helpful in this case.
>   2. Since we wanted to get data on a few oldest cells with
>   timestamps, we created a tool mostly based off of sstable2json, called
>   sstableslicer, to output 'n' top/bottom cells in an SSTable, ordered 
> either
>   on writetime/localDeletionTime. This helped us identify the specific 
> cells
>   in new SSTables with older timestamps, which further helped in 
> debugging on
>   the application end. From application team perspective, however, writing
>   data with old timestamp is not a possible scenario.
>
>3. Below is a sample output of sstableslicer
[image: Inline image 2]


> Looking for suggestions, especially around following two things:
>
>1. Did we miss any other case in TWCS that could be causing such
>overlap?
>2. Does sstableslicer seem valuable, to be included in Apache C*? If
>yes, I shall create a JIRA and submit a PR/patch for review.
>
> C* version we use is 2.1.17.

Thanks,
> Sumanth
>


Re: Cassandra Stress Tool - Insert data in a pace

2017-08-08 Thread Lucas Benevides
Thank you Michael Schuler,

The option "-rate throttle" is not in the documentation so I didn't find
it. It has replaced the old "-rate limit" since the 3.8 version (so it is
written in the NEWS.txt)

I am running the trunk version, so it is already available to me.

Thank you very much,
Lucas Benevides

2017-08-07 13:20 GMT-03:00 Michael Shuler :

> On 08/04/2017 01:16 PM, Lucas Benevides wrote:
> > I want to test, in a lab, the behavior of the compaction strategies. I
> > found Cassandra Stress Tool very useful, but it does not have an option
> to
> > insert data in a chosen pace, i. e. insert data along a period, with
> just a
> > certain amount of data in each interval (e.g. minute). I suggest to
> create
> > that improvement.
>
> I think these options exist in recent versions. Looking at the following
> help in trunk, it appears I'm able to construct something that emulates
> a set rate over duration time:
>
> `cassandra-stress help`
> `cassandra-stress help write`
> `cassandra-stress help -rate`
>
> Run 2 threads at 100 inserts per second for 10 minutes:
>
> `cassandra-stress write duration=10m -rate threads=2 throttle=100/s`
>
> You did not mention what version you were running. The throttle= option
> doesn't appear to be available in 3.0, so use 3.11 or trunk (4.0)
> cassandra-stress.
>
> --
> Kind regards,
> Michael
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


rebuild constantly fails, 3.11

2017-08-08 Thread Micha
Hi,

it seems I'm not able to add add 3 node dc to a 3 node dc. After
starting the rebuild on a new node, nodetool netstats show it will
receive 1200 files from node-1 and 5000 from node-2. The stream from
node-1 completes but the stream from node-2 allways fails, after sending
ca 4000 files.

After restarting the rebuild it again starts to send the 5000 files.
The whole cluster is connected via one switch only , no firewall
between, the networks shows no errors.
The machines have 8 cores, 32GB RAM and two 1TB discs as raid0.
the logs show no errors. The size of the data is ca 1TB.


Any help is really welcome,

cheers
 Michael






The error is:

Cassandra has shutdown.
error: null
-- StackTrace --
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at
sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161)
at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
at
javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source)
at
javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020)
at
javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298)
at com.sun.proxy.$Proxy7.rebuild(Unknown Source)
at org.apache.cassandra.tools.NodeProbe.rebuild(NodeProbe.java:1190)
at
org.apache.cassandra.tools.nodetool.Rebuild.execute(Rebuild.java:58)
at
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:254)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:168)

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org