[jira] [Created] (CASSANDRA-9129) HintedHandoff in pending state forever after upgrading to 2.0.14 from 2.0.11 and 2.0.12

Russ Lavoie (JIRA) Tue, 07 Apr 2015 11:17:35 -0700

Russ Lavoie created CASSANDRA-9129:
--------------------------------------

             Summary: HintedHandoff in pending state forever after upgrading to 
2.0.14 from 2.0.11 and 2.0.12
                 Key: CASSANDRA-9129
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9129
             Project: Cassandra
          Issue Type: Bug
         Environment: Ubuntu 12.04.5 LTS
AWS (m3.xlarge)
15G RAM
4 core Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Cassandra 2.0.14
            Reporter: Russ Lavoie
             Fix For: 2.0.14



Upgrading from Cassandra 2.0.11 or 2.0.12 to 2.0.14 I am seeing a pending 
hinted hand off that never clears.  New hinted hand offs that go into pending 
waiting for a node to come up clear as expected.  But 1 always remains.

I through the following steps.

1) stop cassandra
2) Upgrade cassandra to 2.0.14
3) Start cassandra
4) nodetool tpstats

There are no errors in the logs, to help with this issue.  I ran a few nodetool 
commands to get some data and pasted them below:

Below is what is shown after running nodetool status on each node in the ring
{code}Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns   Host ID   Rack
UN  <NODE1>  279.8 MB   256     34.9%  <HOSTID>       rack1
UN  <NODE2>  279.79 MB  256     33.0%  <HOSTID>       rack1
UN  <NODE3>  279.87 MB  256     32.1%  <HOSTID>       rack1
{code}

Below is what is shown after running nodetool tpstats on each node in the ring 
showing a single HintedHandoff in pending status that never clears
{code}
Pool Name                    Active   Pending      Completed   Blocked  All 
time blocked
ReadStage                         0         0          14550         0          
       0
RequestResponseStage              0         0         113040         0          
       0
MutationStage                     0         0         168873         0          
       0
ReadRepairStage                   0         0           1147         0          
       0
ReplicateOnWriteStage             0         0              0         0          
       0
GossipStage                       0         0         232112         0          
       0
CacheCleanupExecutor              0         0              0         0          
       0
MigrationStage                    0         0              0         0          
       0
MemoryMeter                       0         0              6         0          
       0
FlushWriter                       0         0             38         0          
       0
ValidationExecutor                0         0              0         0          
       0
InternalResponseStage             0         0              0         0          
       0
AntiEntropyStage                  0         0              0         0          
       0
MemtablePostFlusher               0         0           1333         0          
       0
MiscStage                         0         0              0         0          
       0
PendingRangeCalculator            0         0              6         0          
       0
CompactionExecutor                0         0            178         0          
       0
commitlog_archiver                0         0              0         0          
       0
HintedHandoff                     0         1            133         0          
       0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
PAGED_RANGE                  0
BINARY                       0
READ                         0
MUTATION                     0
_TRACE                       0
REQUEST_RESPONSE             0
COUNTER_MUTATION             0
{code}

Below is what is shown after running nodetool cfstats system.hints on all 3 
nodes.
{code}
Keyspace: system
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 0
        Write Latency: NaN ms.
        Pending Tasks: 0
                Table: hints
                SSTable count: 0
                Space used (live), bytes: 0
                Space used (total), bytes: 0
                Off heap memory used (total), bytes: 0
                SSTable Compression Ratio: 0.0
                Number of keys (estimate): 0
                Memtable cell count: 0
                Memtable data size, bytes: 0
                Memtable switch count: 0
                Local read count: 0
                Local read latency: 0.000 ms
                Local write count: 0
                Local write latency: 0.000 ms
                Pending tasks: 0
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used, bytes: 0
                Bloom filter off heap memory used, bytes: 0
                Index summary off heap memory used, bytes: 0
                Compression metadata off heap memory used, bytes: 0
                Compacted partition minimum bytes: 0
                Compacted partition maximum bytes: 0
                Compacted partition mean bytes: 0
                Average live cells per slice (last five minutes): 0.0
                Average tombstones per slice (last five minutes): 0.0

----------------
{code}

Below is what is shown after running nodetool gossipinfo
{code}
/<NODE1>
  generation:1428349617
  heartbeat:238170
  HOST_ID:<NODE1ID>
  RELEASE_VERSION:2.0.14
  DC:<DCNAME>
  RPC_ADDRESS:<NODE1IP>
  SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
  STATUS:NORMAL,-1399780091502863826
  RACK:rack1
  SEVERITY:0.0
  LOAD:2.93383711E8
  NET_VERSION:7
/<NODE2>
  generation:1428349784
  heartbeat:237665
  HOST_ID:<NODE2ID>
  RELEASE_VERSION:2.0.14
  DC:app3-profiledata
  RPC_ADDRESS:<NODE2>
  SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
  STATUS:NORMAL,-1019261967377984057
  RACK:rack1
  SEVERITY:0.0
  LOAD:2.93393487E8
  NET_VERSION:7
/<NODE3>
  generation:1428348889
  heartbeat:240384
  HOST_ID:<NODE3ID>
  RELEASE_VERSION:2.0.14
  DC:app3-profiledata
  RPC_ADDRESS:<NODE3IP>
  SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
  STATUS:NORMAL,-1060333141359417961
  RACK:rack1
  SEVERITY:0.0
  LOAD:2.9345286E8
  NET_VERSION:7
{code}
  
  
Below is cassandra.yaml
{code}
cluster_name: '<Cluster Name>'

num_tokens: 256

auto_bootstrap: true

hinted_handoff_enabled: true

max_hint_window_in_ms: 345600000

hinted_handoff_throttle_in_kb: 1024

max_hints_delivery_threads: 2

authenticator: AllowAllAuthenticator

authorizer: AllowAllAuthorizer

permissions_validity_in_ms: 2000

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

data_file_directories:
    - /mnt/cassandra/data

commitlog_directory: /mnt/cassandra/commitlog

disk_failure_policy: stop

key_cache_size_in_mb:

key_cache_save_period: 14400

row_cache_size_in_mb: 0

row_cache_save_period: 0

saved_caches_directory: /mnt/cassandra/saved_caches

commitlog_sync: batch

commitlog_sync_batch_window_in_ms: 50

commitlog_segment_size_in_mb: 32

seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "<NODE1>,<NODE2>,<NODE3>"

concurrent_reads: 32

concurrent_writes: 32

memtable_total_space_in_mb: 512

memtable_flush_queue_size: 4

trickle_fsync: false

trickle_fsync_interval_in_kb: 10240

storage_port: 7000

ssl_storage_port: 7001

listen_address: <LOCALIP>

start_native_transport: true

native_transport_port: 9042

start_rpc: true

rpc_address: <LOCALIP>

rpc_port: 9160

rpc_keepalive: true

rpc_server_type: hsha

rpc_min_threads: 16

rpc_max_threads: 256

thrift_framed_transport_size_in_mb: 15

incremental_backups: false

snapshot_before_compaction: false

auto_snapshot: true

column_index_size_in_kb: 64

in_memory_compaction_limit_in_mb: 64

multithreaded_compaction: false

compaction_throughput_mb_per_sec: 128

compaction_preheat_key_cache: true

read_request_timeout_in_ms: 10000

range_request_timeout_in_ms: 10000

write_request_timeout_in_ms: 10000

truncate_request_timeout_in_ms: 60000

request_timeout_in_ms: 10000

cross_node_timeout: false

phi_convict_threshold: 12

endpoint_snitch: PropertyFileSnitch

dynamic_snitch_update_interval_in_ms: 100

dynamic_snitch_reset_interval_in_ms: 600000

dynamic_snitch_badness_threshold: 0.2

request_scheduler: org.apache.cassandra.scheduler.NoScheduler

index_interval: 512

server_encryption_options:
    internode_encryption: none
    keystore: conf/.keystore
    keystore_password: cassandra
    truststore: conf/.truststore
    truststore_password: cassandra

client_encryption_options:
    enabled: false
    keystore: conf/.keystore
    keystore_password: cassandra

internode_compression: all

inter_dc_tcp_nodelay: true
{code}

I have stopped upgrading my other cassandra clusters until cause for this 
behavior is found.

Please let me know if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9129) HintedHandoff in pending state forever after upgrading to 2.0.14 from 2.0.11 and 2.0.12

Reply via email to