[ 
https://issues.apache.org/jira/browse/CASSANDRA-17594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533021#comment-17533021
 ] 

David Capwell commented on CASSANDRA-17594:
-------------------------------------------

A few issues are showing up, and just hiding the logs isn't a solution as they 
show real bugs...

{code}
ERROR [RepairJobTask:2] 2022-05-06 17:20:15,757 
SystemDistributedKeyspace.java:222 - Error executing query UPDATE 
system_distributed.repair_history SET status = 'FAILED', finished_at = 
toTimestamp(now()), exception_message=?, exception_stacktrace=? WHERE 
keyspace_name = 'keyspace1' AND columnfamily_name = 'standard1' AND id = 
c3ff84a0-cd60-11ec-8ac5-4b7fd7730840
java.lang.AssertionError: java.lang.InterruptedException
  at 
org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:187)
  at 
org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:794)
  at org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:756)
  at 
org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:1314)
  at org.apache.cassandra.service.StorageProxy$2.apply(StorageProxy.java:137)
  at 
org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1151)
  at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:710)
  at 
org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:931)
  at 
org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:434)
  at 
org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:420)
  at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:219)
  at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:250)
  at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:265)
  at 
org.apache.cassandra.repair.SystemDistributedKeyspace.processSilent(SystemDistributedKeyspace.java:218)
  at 
org.apache.cassandra.repair.SystemDistributedKeyspace.failedRepairJob(SystemDistributedKeyspace.java:206)
  at org.apache.cassandra.repair.RepairJob$3.onFailure(RepairJob.java:132)
  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1313)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException: null
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
  at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
  at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
  at 
org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:183)
  ... 20 common frames omitted
{code}

I know [~maedhroz] was looking at this a while back, but repair interrupts and 
read/write path assumes we don't do that, which can cause issues... so we 
should avoid interrupting. 

{code}
WARN  [epollEventLoopGroup-2-14] 2022-05-06 17:15:19,006 NoSpamLogger.java:94 - 
Protocol exception with client networking: 
org.apache.cassandra.transport.ProtocolException: Invalid or unsupported 
protocol version (5); the lowest supported version is 3 and the greatest is 4
{code}

the code requests v4 and attempts to connect to node1... yet node2 sees a 
connection of v5... I can't explain this one yet so not sure if python-dtest or 
python-driver related. 

> Fix flaky python-tests due to connection getting closed
> -------------------------------------------------------
>
>                 Key: CASSANDRA-17594
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17594
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: NA
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> We log unknown exceptions at the networking level, which includes case where 
> the remote side closes the connection (such as the cases caused by shutting 
> down), depending on how quickly the instances shutdown, this could cause 
> python-dtest to fail for random tests with a message such as
> {code}
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [WARN  [epollEventLoopGroup-5-9] 2022-05-03T16:47:03,800 
> ExceptionHandlers.java:134 - Unknown exception in client networking
> io.netty.channel.unix.Errors$NativeIoException: writeAddress(..) failed: 
> Connection reset by peer, WARN  [epollEventLoopGroup-5-9] 
> 2022-05-03T16:47:03,800 ExceptionHandlers.java:134 - Unknown exception in 
> client networking
> io.netty.channel.unix.Errors$NativeIoException: writeAddress(..) failed: 
> Connection reset by peer]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to