Re: Client-side timeouts after dropping table

John Sanda Tue, 20 Sep 2016 20:08:09 -0700

Thanks Nate. We do not have monitoring set up yet, but I should be able to
get the deployment updated with a metrics reporter. I'll update the thread
with my findings.


On Tue, Sep 20, 2016 at 10:30 PM, Nate McCall <n...@thelastpickle.com>
wrote:

> If you can get to them in the test env. you want to look in
> o.a.c.metrics.CommitLog for:
> - TotalCommitlogSize: if this hovers near commitlog_size_in_mb and never
> goes down, you are thrashing on segment allocation
> - WaitingOnCommit: this is the time spent waiting on calls to sync and
> will start to climb real fast if you cant sync within sync_interval
> - WaitingOnSegmentAllocation: how long it took to allocate a new commitlog
> segment, if it is all over the place it is IO bound
>
> Try turning all the commit log settings way down for low-IO test
> infrastructure like this. Maybe total commit log size of like 32mb with 4mb
> segments (or even lower depending on test data volume) so they basically
> flush constantly and don't try to hold any tables open. Also lower
> concurrent_writes substantially while you are at it to add some write
> throttling.
>
> On Wed, Sep 21, 2016 at 2:14 PM, John Sanda <john.sa...@gmail.com> wrote:
>
>> I have seen in various threads on the list that 3.0.x is probably best
>> for prod. Just wondering though if there is anything in particular in 3.7
>> to be weary of.
>>
>> I need to check with one of our QA engineers to get specifics on the
>> storage. Here is what I do know. We have a blade center running lots of
>> virtual machines for various testing. Some of those vm's are running
>> Cassandra and the Java web apps I previously mentioned via docker
>> containers. The storage is shared. Beyond that I don't have any more
>> specific details at the moment. I can also tell you that the storage can be
>> quite slow.
>>
>> I have come across different threads that talk to one degree or another
>> about the flush queue getting full. I have been looking at the code in
>> ColumnFamilyStore.java. Is perDiskFlushExecutors the thread pool I should
>> be interested in? It uses an unbounded queue, so I am not really sure what
>> it means for it to get full. Is there anything I can check or look for to
>> see if writes are getting blocked?
>>
>> On Tue, Sep 20, 2016 at 8:41 PM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>>> If you haven't yet deployed to prod I strongly recommend *not* using
>>> 3.7.
>>>
>>> What network storage are you using?  Outside of a handful of highly
>>> experienced experts using EBS in very specific ways, it usually ends in
>>> failure.
>>>
>>> On Tue, Sep 20, 2016 at 3:30 PM John Sanda <john.sa...@gmail.com> wrote:
>>>
>>>> I am deploying multiple Java web apps that connect to a Cassandra 3.7
>>>> instance. Each app creates its own schema at start up. One of the schema
>>>> changes involves dropping a table. I am seeing frequent client-side
>>>> timeouts reported by the DataStax driver after the DROP TABLE statement is
>>>> executed. I don't see this behavior in all environments. I do see it
>>>> consistently in a QA environment in which Cassandra is running in docker
>>>> with network storage, so writes are pretty slow from the get go. In my logs
>>>> I see a lot of tables getting flushed, which I guess are all of the dirty
>>>> column families in the respective commit log segment. Then I seen a whole
>>>> bunch of flushes getting queued up. Can I reach a point in which too many
>>>> table flushes get queued such that writes would be blocked?
>>>>
>>>>
>>>> --
>>>>
>>>> - John
>>>>
>>>
>>
>>
>> --
>>
>> - John
>>
>
>
>
> --
> -----------------
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 

- John

Re: Client-side timeouts after dropping table

Reply via email to