[
https://issues.apache.org/jira/browse/CASSANDRA-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331346#comment-16331346
]
Pedro Gordo edited comment on CASSANDRA-12750 at 1/18/18 10:24 PM:
-------------------------------------------------------------------
If you run the same query from within cqlsh (with tracing on), do you get a
message of the sort:
{{Trace session id: blablabla [more]
Unable to retrieve Query Trace information [more]}} ?
was (Author: pedro_gordo):
If you run the same query from within cqlsh (with tracing on), do you get a
message of the sort:
{{Trace session id: blablabla [more] Unable to retrieve Query Trace information
[more]}} ?
> TRACE messages get dropped
> --------------------------
>
> Key: CASSANDRA-12750
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12750
> Project: Cassandra
> Issue Type: Bug
> Components: Local Write-Read Paths, Observability
> Environment: Ubuntu 14.04 LTS
> Reporter: Kuku1
> Priority: Major
>
> Two of my nodes drop TRACE messages for some reason.
> I'm using Cassandra 3.0.8.
> I'm running a "select * from ... " query that will return 300.000 rows. I
> want to measure the total query runtime.
> My application runs the following code.
> {code}
> PreparedStatement preparedStatement =
> connection.prepareStatement(query).enableTracing();
> if (preparedStatement == null) {
> return;
> }
> BoundStatement bs = preparedStatement.bind();
> ResultSet rs = connection.execute(bs);
> while (!rs.isFullyFetched()) {
> rs.fetchMoreResults();
> }
> //all results have been fetched at this point
> //sleep 15s to wait for C* to write the query traces
> Thread.sleep(15000);
> long totalQueryRuntime = 0;
> List<ExecutionInfo> executionInfos = rs.getAllExecutionInfo();
> for (ExecutionInfo ex : executionInfos) {
> QueryTrace queryTrace = ex.getQueryTrace();
> totalQueryRuntime += queryTrace.getDurationMicros();
> if (verbose) {
> for (Event event : queryTrace.getEvents()) {
> //some logging
> }
> }
> }
> {code}
> The application fails with this exception:
> {code}
> com.datastax.driver.core.exceptions.TraceRetrievalException: Unable to
> retrieve complete query trace for id 88dae7d0-8ada-11e6-9f80-d9082a06988f
> after 5 tries
> at
> com.datastax.driver.core.QueryTrace.doFetchTrace(QueryTrace.java:228)
> at
> com.datastax.driver.core.QueryTrace.maybeFetchTrace(QueryTrace.java:177)
> at
> com.datastax.driver.core.QueryTrace.getDurationMicros(QueryTrace.java:103)
> {code}
> In my system.logs, I can find these output (similar) on two of my six nodes:
> {code}
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,650 MessagingService.java:983 -
> _TRACE messages were dropped in last 5000 ms: 511 for internal timeout and 0
> for cross node timeout
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,651 StatusLogger.java:52 - Pool
> Name Active Pending Completed Blocked All Time
> Blocked
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,657 StatusLogger.java:56 -
> MutationStage 0 0 35036 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,658 StatusLogger.java:56 -
> ViewMutationStage 0 0 0 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,658 StatusLogger.java:56 -
> ReadStage 0 0 399 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,659 StatusLogger.java:56 -
> RequestResponseStage 0 0 7158 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,659 StatusLogger.java:56 -
> ReadRepairStage 0 0 2 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,660 StatusLogger.java:56 -
> CounterMutationStage 0 0 0 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,661 StatusLogger.java:56 -
> MiscStage 0 0 0 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,661 StatusLogger.java:56 -
> CompactionExecutor 0 0 52 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,662 StatusLogger.java:56 -
> MemtableReclaimMemory 0 0 21 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,662 StatusLogger.java:56 -
> PendingRangeCalculator 0 0 8 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,663 StatusLogger.java:56 -
> GossipStage 0 0 476 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,663 StatusLogger.java:56 -
> SecondaryIndexManagement 0 0 0 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,664 StatusLogger.java:56 -
> HintsDispatcher 0 0 0 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,664 StatusLogger.java:56 -
> MigrationStage 0 0 0 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,665 StatusLogger.java:56 -
> MemtablePostFlush 0 0 40 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,666 StatusLogger.java:56 -
> ValidationExecutor 0 0 0 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,666 StatusLogger.java:56 -
> Sampler 0 0 0 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,667 StatusLogger.java:56 -
> MemtableFlushWriter 0 0 21 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,668 StatusLogger.java:56 -
> InternalResponseStage 0 0 0 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,669 StatusLogger.java:56 -
> AntiEntropyStage 0 0 0 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,670 StatusLogger.java:56 -
> CacheCleanupExecutor 0 0 0 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:56 -
> Native-Transport-Requests 0 0 116 0
> 0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:66 -
> CompactionManager 0 1
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:78 -
> MessagingService n/a 0/0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:88 - Cache
> Type Size Capacity
> KeysToSave
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:90 -
> KeyCache 3160 104857600
> all
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,671 StatusLogger.java:96 -
> RowCache 0 0
> all
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:103 -
> Table Memtable ops,data
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 -
> system_distributed.parent_repair_history 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 -
> system_distributed.repair_history 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 -
> system.compaction_history 2,384
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 -
> system.hints 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,672 StatusLogger.java:106 -
> system.schema_aggregates 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 -
> system.IndexInfo 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 -
> system.schema_columnfamilies 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 -
> system.schema_triggers 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 -
> system.size_estimates 2120,252650
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 -
> system.schema_functions 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 -
> system.paxos 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 -
> system.views_builds_in_progress 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,673 StatusLogger.java:106 -
> system.built_views 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 -
> system.peer_events 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 -
> system.range_xfers 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 -
> system.peers 97,46227
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 -
> system.batches 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 -
> system.schema_keyspaces 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 -
> system.schema_usertypes 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,674 StatusLogger.java:106 -
> system.local 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 -
> system.sstable_activity 28,1100
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 -
> system.available_ranges 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 -
> system.batchlog 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 -
> system.schema_columns 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 -
> fleetstream.data 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 -
> system_schema.columns 168,25858
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 -
> system_schema.types 2,16
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,675 StatusLogger.java:106 -
> system_schema.indexes 2,16
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 -
> system_schema.keyspaces 4,242
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 -
> system_schema.dropped_columns 2,16
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 -
> system_schema.aggregates 2,16
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 -
> system_schema.triggers 2,16
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 -
> system_schema.tables 27,17400
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 -
> system_schema.views 2,16
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,676 StatusLogger.java:106 -
> system_schema.functions 2,16
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 -
> test.data 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 -
> system_auth.roles 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 -
> system_auth.role_members 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 -
> system_auth.resource_role_permissons_index 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 -
> system_auth.role_permissions 0,0
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 -
> system_traces.sessions 58,9106
> INFO [ScheduledTasks:1] 2016-10-05 12:51:47,677 StatusLogger.java:106 -
> system_traces.events 320,62299
> {code}
> nodetool tpstats on one of the two nodes that drop the traces shows:
> {code}
> nodetool tpstats
> Pool Name Active Pending Completed Blocked All
> time blocked
> MutationStage 0 0 32911 0
> 0
> ViewMutationStage 0 0 0 0
> 0
> ReadStage 0 0 1065 0
> 0
> RequestResponseStage 0 0 10969 0
> 0
> ReadRepairStage 0 0 6 0
> 0
> CounterMutationStage 0 0 0 0
> 0
> MiscStage 0 0 0 0
> 0
> CompactionExecutor 0 0 972 0
> 0
> MemtableReclaimMemory 0 0 21 0
> 0
> PendingRangeCalculator 0 0 7 0
> 0
> GossipStage 0 0 6278 0
> 0
> SecondaryIndexManagement 0 0 0 0
> 0
> HintsDispatcher 0 0 0 0
> 0
> MigrationStage 0 0 0 0
> 0
> MemtablePostFlush 0 0 40 0
> 0
> ValidationExecutor 0 0 0 0
> 0
> Sampler 0 0 0 0
> 0
> MemtableFlushWriter 0 0 21 0
> 0
> InternalResponseStage 0 0 0 0
> 0
> AntiEntropyStage 0 0 0 0
> 0
> CacheCleanupExecutor 0 0 0 0
> 0
> Native-Transport-Requests 0 0 209 0
> 0
> Message type Dropped
> READ 0
> RANGE_SLICE 0
> _TRACE 2789
> HINT 0
> MUTATION 0
> COUNTER_MUTATION 0
> BATCH_STORE 0
> BATCH_REMOVE 0
> REQUEST_RESPONSE 0
> PAGED_RANGE 0
> READ_REPAIR 0
> {code}
> It should not be a memory problem because my nodes have ~20GB of free RAM:
> {code}
> total used free shared buffers cached
> Mem: 49450072 27703928 21746144 944 181696 17040856
> -/+ buffers/cache: 10481376 38968696
> Swap: 50319356 148 50319208
> total used free shared buffers cached
> Mem: 49450072 30405324 19044748 1124 225256 17627620
> -/+ buffers/cache: 12552448 36897624
> Swap: 50319356 0 50319356
> {code}
> I tried with default fetchSize (5000) as well as fetchSize 50000 - both did
> not help. Restarting all C* services did not help either.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]