Russ Lavoie created CASSANDRA-9129:
--------------------------------------
Summary: HintedHandoff in pending state forever after upgrading to
2.0.14 from 2.0.11 and 2.0.12
Key: CASSANDRA-9129
URL: https://issues.apache.org/jira/browse/CASSANDRA-9129
Project: Cassandra
Issue Type: Bug
Environment: Ubuntu 12.04.5 LTS
AWS (m3.xlarge)
15G RAM
4 core Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
Cassandra 2.0.14
Reporter: Russ Lavoie
Fix For: 2.0.14
Upgrading from Cassandra 2.0.11 or 2.0.12 to 2.0.14 I am seeing a pending
hinted hand off that never clears. New hinted hand offs that go into pending
waiting for a node to come up clear as expected. But 1 always remains.
I through the following steps.
1) stop cassandra
2) Upgrade cassandra to 2.0.14
3) Start cassandra
4) nodetool tpstats
There are no errors in the logs, to help with this issue. I ran a few nodetool
commands to get some data and pasted them below:
Below is what is shown after running nodetool status on each node in the ring
{code}Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN <NODE1> 279.8 MB 256 34.9% <HOSTID> rack1
UN <NODE2> 279.79 MB 256 33.0% <HOSTID> rack1
UN <NODE3> 279.87 MB 256 32.1% <HOSTID> rack1
{code}
Below is what is shown after running nodetool tpstats on each node in the ring
showing a single HintedHandoff in pending status that never clears
{code}
Pool Name Active Pending Completed Blocked All
time blocked
ReadStage 0 0 14550 0
0
RequestResponseStage 0 0 113040 0
0
MutationStage 0 0 168873 0
0
ReadRepairStage 0 0 1147 0
0
ReplicateOnWriteStage 0 0 0 0
0
GossipStage 0 0 232112 0
0
CacheCleanupExecutor 0 0 0 0
0
MigrationStage 0 0 0 0
0
MemoryMeter 0 0 6 0
0
FlushWriter 0 0 38 0
0
ValidationExecutor 0 0 0 0
0
InternalResponseStage 0 0 0 0
0
AntiEntropyStage 0 0 0 0
0
MemtablePostFlusher 0 0 1333 0
0
MiscStage 0 0 0 0
0
PendingRangeCalculator 0 0 6 0
0
CompactionExecutor 0 0 178 0
0
commitlog_archiver 0 0 0 0
0
HintedHandoff 0 1 133 0
0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 0
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
{code}
Below is what is shown after running nodetool cfstats system.hints on all 3
nodes.
{code}
Keyspace: system
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Table: hints
SSTable count: 0
Space used (live), bytes: 0
Space used (total), bytes: 0
Off heap memory used (total), bytes: 0
SSTable Compression Ratio: 0.0
Number of keys (estimate): 0
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 0
Local read count: 0
Local read latency: 0.000 ms
Local write count: 0
Local write latency: 0.000 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 0
Bloom filter off heap memory used, bytes: 0
Index summary off heap memory used, bytes: 0
Compression metadata off heap memory used, bytes: 0
Compacted partition minimum bytes: 0
Compacted partition maximum bytes: 0
Compacted partition mean bytes: 0
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
----------------
{code}
Below is what is shown after running nodetool gossipinfo
{code}
/<NODE1>
generation:1428349617
heartbeat:238170
HOST_ID:<NODE1ID>
RELEASE_VERSION:2.0.14
DC:<DCNAME>
RPC_ADDRESS:<NODE1IP>
SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
STATUS:NORMAL,-1399780091502863826
RACK:rack1
SEVERITY:0.0
LOAD:2.93383711E8
NET_VERSION:7
/<NODE2>
generation:1428349784
heartbeat:237665
HOST_ID:<NODE2ID>
RELEASE_VERSION:2.0.14
DC:app3-profiledata
RPC_ADDRESS:<NODE2>
SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
STATUS:NORMAL,-1019261967377984057
RACK:rack1
SEVERITY:0.0
LOAD:2.93393487E8
NET_VERSION:7
/<NODE3>
generation:1428348889
heartbeat:240384
HOST_ID:<NODE3ID>
RELEASE_VERSION:2.0.14
DC:app3-profiledata
RPC_ADDRESS:<NODE3IP>
SCHEMA:132878b7-a33b-3ca3-b83d-3cacf7fc2138
STATUS:NORMAL,-1060333141359417961
RACK:rack1
SEVERITY:0.0
LOAD:2.9345286E8
NET_VERSION:7
{code}
Below is cassandra.yaml
{code}
cluster_name: '<Cluster Name>'
num_tokens: 256
auto_bootstrap: true
hinted_handoff_enabled: true
max_hint_window_in_ms: 345600000
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
permissions_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
- /mnt/cassandra/data
commitlog_directory: /mnt/cassandra/commitlog
disk_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
saved_caches_directory: /mnt/cassandra/saved_caches
commitlog_sync: batch
commitlog_sync_batch_window_in_ms: 50
commitlog_segment_size_in_mb: 32
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "<NODE1>,<NODE2>,<NODE3>"
concurrent_reads: 32
concurrent_writes: 32
memtable_total_space_in_mb: 512
memtable_flush_queue_size: 4
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: <LOCALIP>
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: <LOCALIP>
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: hsha
rpc_min_threads: 16
rpc_max_threads: 256
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 128
compaction_preheat_key_cache: true
read_request_timeout_in_ms: 10000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 10000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
phi_convict_threshold: 12
endpoint_snitch: PropertyFileSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.2
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 512
server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra
client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra
internode_compression: all
inter_dc_tcp_nodelay: true
{code}
I have stopped upgrading my other cassandra clusters until cause for this
behavior is found.
Please let me know if more information is needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)