Re: Long running compaction on huge hint table.

varun saluja Sun, 21 May 2017 08:10:13 -0700

Thanks a lot Ben.

Really appreciate your suggestions here.


Regards,
Varun Saluja

Sent from my iPhone

> On 21-May-2017, at 5:40 PM, Ben Slater <ben.sla...@instaclustr.com> wrote:
> 
> My main suggestion would be to monitor the compaction backlog (pending 
> compactions). If the backlog is growing you need to either throttle writes, 
> add more capacity to your cluster or possibly tune things. There is no simple 
> answer to tuning but several good guides on the internet to help - this is my 
> favourite: https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html. 
> 
> Unless there is something really badly set up with your cluster then I would 
> guess that if it got in this state trying to handle your write load then 
> you’ll potentially need additional capacity as well as tuning to meet your 
> needs.
> 
> Cheers
> Ben
> 
>> On Sun, 21 May 2017 at 21:47 varun saluja <saluj...@gmail.com> wrote:
>> Hi All,
>>   
>> Can someone Please suggest any recommendations for write intensive jobs
>> 
>> Regards,
>> Varun Saluja
>> Sent from my iPhone
>> 
>>> On 17-May-2017, at 3:52 PM, varun saluja <saluj...@gmail.com> wrote:
>>> 
>>> Thanks Jeff.
>>> 
>>> I have taken backup and did manual removal of hints with rolling restart. 
>>> This brought cluster back in stable state.
>>> 
>>> Can you Please share some recommendation for write intensive job . Actually 
>>> ,we need to load dump from kafka to 3 node cassandra cluster . Write TPS 
>>> per node will be around 7k.
>>> 
>>> Can you Please suggest any parameter tuning for our use case here. We do 
>>> not want to get stuck in similar situation of large compactions of hint or 
>>> any other table where we are loading dump.
>>> 
>>> 
>>> Regards,
>>> Varun
>>> 
>>>> On 17 May 2017 at 09:17, Jeff Jirsa <jji...@gmail.com> wrote:
>>>> You could also try stopping compaction, but that'll probably take a very 
>>>> long time as well
>>>> 
>>>> Manually stopping each node (one at a time) and removing the sstables from 
>>>> only system.hints may be a better option. May want to take a snapshot if 
>>>> you're very concerned with that data.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Jeff Jirsa
>>>> 
>>>> 
>>>>> On May 16, 2017, at 6:53 PM, varun saluja <saluj...@gmail.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>>  
>>>>>  Truncatehints on nodes is running for more than 7 hours now. Nothing 
>>>>> mentioned for same in sysemt logs even.
>>>>> 
>>>>> And compaction stats reports increase in hints total bytes.
>>>>> 
>>>>> pending tasks: 1
>>>>>    compaction type   keyspace   table     completed          total    
>>>>> unit   progress
>>>>>         Compaction     system   hints   12152557998   869257869352   
>>>>> bytes      1.40%
>>>>> Active compaction remaining time :   0h27m14s
>>>>> 
>>>>> Can anything else be checked here? Will manually deleting system.hint 
>>>>> files and restart node fix this.
>>>>> 
>>>>> 
>>>>> 
>>>>> Regards,
>>>>> Varun Saluja
>>>>> 
>>>>>> On 16 May 2017 at 23:29, varun saluja <saluj...@gmail.com> wrote:
>>>>>> Hi Jeff,
>>>>>> 
>>>>>> I ran nodetool truncatehints  on all nodes. Its running for more than 30 
>>>>>> mins now. Status for compactstats reports same.
>>>>>> 
>>>>>> pending tasks: 1
>>>>>>    compaction type   keyspace   table     completed          total    
>>>>>> unit   progress
>>>>>>         Compaction     system   hints   11189118129   851658989612   
>>>>>> bytes      1.31%
>>>>>> Active compaction remaining time :   0h26m43s
>>>>>> 
>>>>>> Will truncatehints takes time for completion? Could not see anything 
>>>>>> related truncatehints in system logs.
>>>>>> 
>>>>>> Please let me know if anything else can be checked here.
>>>>>> 
>>>>>> Regards,
>>>>>> Varun Saluja 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 16 May 2017 at 20:58, varun saluja <saluj...@gmail.com> wrote:
>>>>>>> Thanks a lot Jeff.
>>>>>>> 
>>>>>>> You have explaned very well here. We have consitency as local quorum. 
>>>>>>> Will follow truncate hints and repair therafter.
>>>>>>> 
>>>>>>> I hope this brings cluster in stable state
>>>>>>> 
>>>>>>> Thanks again.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Varun Saluja
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> > On 16-May-2017, at 8:42 PM, Jeff Jirsa <jji...@apache.org> wrote:
>>>>>>> >
>>>>>>> >
>>>>>>> > In Cassandra versions up to 3.0, hints are stored within a table, 
>>>>>>> > where the partition key is the host ID of the server for which the 
>>>>>>> > hints are stored.
>>>>>>> >
>>>>>>> > In such a data model, accumulating 800GB of hints is almost certain 
>>>>>>> > to cause very wide rows, which will in turn cause GC pressure when 
>>>>>>> > you attempt to read the hints for delivery. This will cause GC 
>>>>>>> > pauses, which will cause hints to fail to be delivered, which will 
>>>>>>> > cause more hints to be stored. This is bad.
>>>>>>> >
>>>>>>> > In 3.0, hints were rewritten to work around this design flaw. In 2.1, 
>>>>>>> > your most likely corrective course is to use 'nodetool truncatehints' 
>>>>>>> > on all servers, followed by 'nodetool repair' to deliver the data you 
>>>>>>> > lost by truncating the hints.
>>>>>>> >
>>>>>>> > NOTE: this is ONLY safe if you wrote with a consistency level 
>>>>>>> > stronger than CL:ANY. If you wrote this data with CL:ANY, you may 
>>>>>>> > lose data if you truncate hints.
>>>>>>> >
>>>>>>> > - Jeff
>>>>>>> >
>>>>>>> >> On 2017-05-16 06:50 (-0700), varun saluja <saluj...@gmail.com> wrote:
>>>>>>> >> Thanks for update.
>>>>>>> >> I could see lot of io waits. This causing  Gc and mutation drops .
>>>>>>> >> But as i mentioned we do not have high load for now. Hint replays 
>>>>>>> >> are creating such high disk I/O.
>>>>>>> >> compactionstats show very high hint bytes like 780gb around. Is this 
>>>>>>> >> normal?
>>>>>>> >>
>>>>>>> >> Just mentioning we are using flash disks.
>>>>>>> >>
>>>>>>> >> In such case, if i run truncatehints , will it remove or decrease 
>>>>>>> >> size of hints bytes in compaction stats. I can trigger repair 
>>>>>>> >> therafter.
>>>>>>> >> Please let me know if any recommendation on same.
>>>>>>> >>
>>>>>>> >> Also , table which we dumped from kafka which created this much 
>>>>>>> >> hints and compaction pendings is also dropped today. Because we have 
>>>>>>> >> to redump table again once cluster is stable.
>>>>>>> >>
>>>>>>> >> Regards,
>>>>>>> >> Varun
>>>>>>> >>
>>>>>>> >> Sent from my iPhone
>>>>>>> >>
>>>>>>> >>> On 16-May-2017, at 6:59 PM, Nitan Kainth <ni...@bamlabs.com> wrote:
>>>>>>> >>>
>>>>>>> >>> Yes but it means data has to be replicated using repair.
>>>>>>> >>>
>>>>>>> >>> Hints are out come of unhealthy nodes, focus on finding why you 
>>>>>>> >>> have mutation drops, is it node, io or network etc. ideally you 
>>>>>>> >>> shouldn't see increasing hints all the time.
>>>>>>> >>>
>>>>>>> >>> Sent from my iPhone
>>>>>>> >>>
>>>>>>> >>>> On May 16, 2017, at 7:58 AM, varun saluja <saluj...@gmail.com> 
>>>>>>> >>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>> Hi Nitan,
>>>>>>> >>>>
>>>>>>> >>>> Thanks for response.
>>>>>>> >>>>
>>>>>>> >>>> Yes, I could see mutation drops and increase count in 
>>>>>>> >>>> system.hints. Is there any way , i can proceed to truncate hints 
>>>>>>> >>>> like using nodetool truncatehints.
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> Regards,
>>>>>>> >>>> Varun Saluja
>>>>>>> >>>>
>>>>>>> >>>>> On 16 May 2017 at 17:52, Nitan Kainth <ni...@bamlabs.com> wrote:
>>>>>>> >>>>> Do you see mutation drops?
>>>>>>> >>>>> Select count from system.hints; is it increasing?
>>>>>>> >>>>>
>>>>>>> >>>>> Sent from my iPhone
>>>>>>> >>>>>
>>>>>>> >>>>>> On May 16, 2017, at 5:52 AM, varun saluja <saluj...@gmail.com> 
>>>>>>> >>>>>> wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>> Hi Experts,
>>>>>>> >>>>>>
>>>>>>> >>>>>> We are facing issue on production cluster. Compaction on 
>>>>>>> >>>>>> system.hint table is running from last 2 days.
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> pending tasks: 1
>>>>>>> >>>>>>   compaction type   keyspace   table     completed          
>>>>>>> >>>>>> total                      unit   progress
>>>>>>> >>>>>>              Compaction     system   hints   20623021829   
>>>>>>> >>>>>> 877874092407   bytes      2.35%
>>>>>>> >>>>>> Active compaction remaining time :   0h27m15s
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> Active compaction remaining time shows in minutes.  But, this is 
>>>>>>> >>>>>> job is running like indefinitely.
>>>>>>> >>>>>>
>>>>>>> >>>>>> We have 3 node cluster V 2.1.7. And we ran  write intensive job 
>>>>>>> >>>>>> last week on particular table.
>>>>>>> >>>>>> Compaction on this table finished but hint table size is growing 
>>>>>>> >>>>>> continuously.
>>>>>>> >>>>>>
>>>>>>> >>>>>> Can someone Please help me.
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> Thanks & Regards,
>>>>>>> >>>>>> Varun Saluja
>>>>>>> >>>>>>
>>>>>>> >>>>
>>>>>>> >>
>>>>>>> >
>>>>>>> > ---------------------------------------------------------------------
>>>>>>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>>>>> > For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>>>> >
>>>>>> 
>>>>> 
>>> 
> 
> -- 
> Ben Slater
> Chief Product Officer
> 
>     
> Read our latest technical blog posts here.
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia) 
> and Instaclustr Inc (USA).
> This email and any attachments may contain confidential and legally 
> privileged information.  If you are not the intended recipient, do not copy 
> or disclose its content, but please reply to this email immediately and 
> highlight the error to the sender and then immediately delete the message.

Re: Long running compaction on huge hint table.

Reply via email to