[ 
https://issues.apache.org/jira/browse/CASSANDRA-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13571758#comment-13571758
 ] 

Brandon Williams commented on CASSANDRA-5068:
---------------------------------------------

I ran into this in 1.2, and the problem is definitely old sstables full of 
tombstones being left behind:

{noformat}
 activity                                                                       
                 | timestamp    | source        | source_elapsed
-------------------------------------------------------------------------------------------------+--------------+---------------+----------------
                                                                              
execute_cql3_query | 21:10:25,847 | 10.179.64.227 |              0
                                                                               
Parsing statement | 21:10:25,847 | 10.179.64.227 |             39
                                                                              
Peparing statement | 21:10:25,847 | 10.179.64.227 |            208
                                                                   Determining 
replicas to query | 21:10:25,847 | 10.179.64.227 |            316
 Executing seq scan across 2 sstables for [min(-9223372036854775808), 
min(-9223372036854775808)] | 21:10:25,870 | 10.179.64.227 |          23223
                                                          Read 0 live cells and 
13800 tombstoned | 21:10:25,928 | 10.179.64.227 |          81432
                                                          Read 0 live cells and 
13068 tombstoned | 21:10:26,015 | 10.179.64.227 |         168213
                                                                    Scanned 2 
rows and matched 2 | 21:10:26,016 | 10.179.64.227 |         169206
                                                                                
Request complete | 21:10:26,016 | 10.179.64.227 |         169585
{noformat}


The problem appears to occur here:

{noformat}
 INFO [GossipStage:1] 2013-02-04 22:54:34,828 Gossiper.java (line 754) 
InetAddress /10.179.111.137 is now UP
 INFO [GossipStage:1] 2013-02-04 22:54:34,830 Gossiper.java (line 754) 
InetAddress /10.179.65.102 is now UP
 INFO [HintedHandoff:1] 2013-02-04 22:54:34,830 HintedHandOffManager.java (line 
297) Started hinted handoff for host: 5b49c861-0cf6-48dc-872a-7fcb89429dae with 
IP: /10.179.111.137
 INFO [HintedHandoff:2] 2013-02-04 22:54:34,830 HintedHandOffManager.java (line 
297) Started hinted handoff for host: 0fd1d3b1-0f73-40fd-ab36-9f4b9636a205 with 
IP: /10.179.65.102
 INFO [HintedHandoff:2] 2013-02-04 22:54:41,039 ColumnFamilyStore.java (line 
678) Enqueuing flush of Memtable-hints@164677298(1020984/1020984 
serialized/live bytes, 39151 ops)
 INFO [FlushWriter:2] 2013-02-04 22:54:41,041 Memtable.java (line 453) Writing 
Memtable-hints@164677298(1020984/1020984 serialized/live bytes, 39151 ops)
 INFO [CompactionExecutor:8] 2013-02-04 22:54:41,043 CompactionTask.java (line 
112) Compacting 
[SSTableReader(path='/var/lib/cassandra/data/system/hints/system-hints-ib-1-Data.db')]
 INFO [FlushWriter:2] 2013-02-04 22:54:41,200 Memtable.java (line 487) 
Completed flushing 
/var/lib/cassandra/data/system/hints/system-hints-ib-2-Data.db (166760 bytes) 
for commitlog position ReplayPosition(segmentId=1360018215672, 
position=11404848)
 INFO [CompactionExecutor:9] 2013-02-04 22:54:41,215 CompactionManager.java 
(line 452) SSTables for user defined compaction are already being compacted.
 INFO [HintedHandoff:2] 2013-02-04 22:54:41,216 HintedHandOffManager.java (line 
412) Finished hinted handoff of 13909 rows to endpoint /10.179.65.102
 INFO [CompactionExecutor:8] 2013-02-04 22:54:41,444 CompactionTask.java (line 
272) Compacted 1 sstables to 
[/var/lib/cassandra/data/system/hints/system-hints-ib-3,].  625,567 bytes to 
625,567 (~100% of original) in 401ms = 1.487749MB/s.  2 total rows, 2 unique.  
Row merge counts were {1:2, }
 INFO [HintedHandoff:1] 2013-02-04 22:54:41,445 HintedHandOffManager.java (line 
412) Finished hinted handoff of 13230 rows to endpoint /10.179.111.137
{noformat}

The problem is that with concurrent delivery we end up in a state where we have 
two sstables that contain tombstones, each containing some portion of them for 
the endpoints that were delivered to.  Since we never replay any rows again, we 
never compact again, and thus never evict the tombstones.

I think I'm back to what I proposed before of just removing the replayed > 0 
check and letting isEmpty check handle the common case.
 
                
> CLONE - Once a host has been hinted to, log messages for it repeat every 10 
> mins even if no hints are delivered
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5068
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5068
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.6, 1.2.0
>         Environment: cassandra 1.1.6
> java 1.6.0_30
>            Reporter: Peter Haggerty
>            Assignee: Brandon Williams
>            Priority: Minor
>              Labels: hinted, hintedhandoff, phantom
>
> We have "0 row" hinted handoffs every 10 minutes like clockwork. This impacts 
> our ability to monitor the cluster by adding persistent noise in the handoff 
> metric.
> Previous mentions of this issue are here:
> http://www.mail-archive.com/[email protected]/msg25982.html
> The hinted handoffs can be scrubbed away with
> nodetool -h 127.0.0.1 scrub system HintsColumnFamily
> but they return after anywhere from a few minutes to multiple hours later.
> These started to appear after an upgrade to 1.1.6 and haven't gone away 
> despite rolling cleanups, rolling restarts, multiple rounds of scrubbing, etc.
> A few things we've noticed about the handoffs:
> 1. The phantom handoff endpoint changes after a non-zero handoff comes through
> 2. Sometimes a non-zero handoff will be immediately followed by an "off 
> schedule" phantom handoff to the endpoint the phantom had been using before
> 3. The sstable2json output seems to include multiple sub-sections for each 
> handoff with the same "deletedAt" information.
> The phantom handoff endpoint changes after a non-zero handoff comes through:
>  INFO [HintedHandoff:1] 2012-12-11 06:57:35,093 HintedHandOffManager.java 
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.1
>  INFO [HintedHandoff:1] 2012-12-11 07:07:35,092 HintedHandOffManager.java 
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.1
>  INFO [HintedHandoff:1] 2012-12-11 07:07:37,915 HintedHandOffManager.java 
> (line 392) Finished hinted handoff of 1058 rows to endpoint /10.10.10.2
>  INFO [HintedHandoff:1] 2012-12-11 07:17:35,093 HintedHandOffManager.java 
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.2
>  INFO [HintedHandoff:1] 2012-12-11 07:27:35,093 HintedHandOffManager.java 
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.2
> Sometimes a non-zero handoff will be immediately followed by an "off 
> schedule" phantom handoff to the endpoint the phantom had been using before:
>  INFO [HintedHandoff:1] 2012-12-12 21:47:39,335 HintedHandOffManager.java 
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
>  INFO [HintedHandoff:1] 2012-12-12 21:57:39,335 HintedHandOffManager.java 
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
>  INFO [HintedHandoff:1] 2012-12-12 22:07:43,319 HintedHandOffManager.java 
> (line 392) Finished hinted handoff of 1416 rows to endpoint /10.10.10.4
>  INFO [HintedHandoff:1] 2012-12-12 22:07:43,320 HintedHandOffManager.java 
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.3
>  INFO [HintedHandoff:1] 2012-12-12 22:17:39,357 HintedHandOffManager.java 
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.4
>  INFO [HintedHandoff:1] 2012-12-12 22:27:39,337 HintedHandOffManager.java 
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.10.10.4
> The first few entries from one of the json files:
> {
>     "0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa": {
>         "ccf5dc203a2211e20000e154da71a9bb": {
>             "deletedAt": -9223372036854775808, 
>             "subColumns": []
>         }, 
>         "ccf603303a2211e20000e154da71a9bb": {
>             "deletedAt": -9223372036854775808, 
>             "subColumns": []
>         }, 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to