[
https://issues.apache.org/jira/browse/CASSANDRA-10687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024737#comment-15024737
]
Eyal Sorek commented on CASSANDRA-10687:
----------------------------------------
Not yet. We are still on 2.0.9.
After the sync is done and the node fully joins into the cluster the -
nodetool info - works ok.
About the LOCAL_ONE, we replaced it to ONE, which works smooth.
Does adding a new node to the cluster (from 8 nodes to 9 nodes with RF=4,
in NetworkTopology - DC1=2, DC2=2) supposed to affect so much the cluster
performance ?
We experienced significant slowness of ~200% ~ 300% slower response time.
Even when adding the 12th node, after adding 3, one by one, we still
experienced that slowness.
We have 1GB dedicated tunnel between the Data Centers, which was not fully
utilized.
We have in that cluster around ~80GB per node in the cluster.
Thanks,
Eyal
On Thu, Nov 19, 2015 at 6:57 PM, Aleksey Yeschenko (JIRA) <[email protected]>
--
*Regards,*
*Eyal Sorek*DBA Team Lead
Cell: 050-3137556
40 Hanamal street, Tel Aviv, Israel
<http://www.wix.com/>
> When adding new node to cluster getting Cassandra timeout during write query
> ----------------------------------------------------------------------------
>
> Key: CASSANDRA-10687
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10687
> Project: Cassandra
> Issue Type: Bug
> Components: Configuration, Coordination, Streaming and Messaging
> Environment: Cassandra 2.0.9 using vnodes, on Debian 7.9, on two
> data centers (AUS & TAM)
> Reporter: Eyal Sorek
>
> When adding one new node on 8 nodes cluster (also again after completing
> adding the 9th in AUS data center and again when adding the 10th node on TAM
> data center with same behaviour).
> We get many of the following errors below.
> First - why this, when the node is joining :
> LOCAL_ONE (2 replica were required but only 1 acknowledged the write
> Since when LOCAL_ONE requires 2 replicas ?
> Second, why we fill so much overhead on the all cluster, when a node is
> joining ?
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout
> during write query at consistency LOCAL_ONE (2 replica were required but only
> 1 acknowledged the write)
> Sample stack trace
> …stax.driver.core.exceptions.WriteTimeoutException.copy
> (WriteTimeoutException.java:73)
> …m.datastax.driver.core.DriverThrowables.propagateCause
> (DriverThrowables.java:37)
> ….driver.core.DefaultResultSetFuture.getUninterruptibly
> (DefaultResultSetFuture.java:214)
> com.datastax.driver.core.AbstractSession.execute
> (AbstractSession.java:52)
> com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao$$anonfun$insertCompressed$1.apply(CassandraPagesReadWriteDao.scala:29)
> com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao$$anonfun$insertCompressed$1.apply(CassandraPagesReadWriteDao.scala:25)
> com.wixpress.framework.monitoring.metering.SyncMetering$class.tracking(Metering.scala:58)
> com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadOnlyDao.tracking(CassandraPagesReadOnlyDao.scala:19)
> com.wixpress.publichtml.renderer.data.access.dao.page.CassandraPagesReadWriteDao.insertCompressed(CassandraPagesReadWriteDao.scala:25)
> com.wixpress.html.data.distributor.core.DaoPageDistributor.com$wixpress$html$data$distributor$core$DaoPageDistributor$$distributePage(DaoPageDistributor.scala:36)
> com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply$mcV$sp(DaoPageDistributor.scala:26)
> com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply(DaoPageDistributor.scala:26)
> com.wixpress.html.data.distributor.core.DaoPageDistributor$$anonfun$process$1.apply(DaoPageDistributor.scala:26)
> com.wixpress.framework.monitoring.metering.SyncMetering$class.tracking(Metering.scala:58)
> com.wixpress.html.data.distributor.core.DaoPageDistributor.tracking(DaoPageDistributor.scala:17)
> com.wixpress.html.data.distributor.core.DaoPageDistributor.process(DaoPageDistributor.scala:25)
> com.wixpress.html.data.distributor.core.greyhound.DistributionRequestHandler.handleMessage(DistributionRequestHandler.scala:19)
> com.wixpress.greyhound.KafkaUserHandlers.handleMessage(UserHandlers.scala:11)
> com.wixpress.greyhound.EventsConsumer.com$wixpress$greyhound$EventsConsumer$$handleMessage(EventsConsumer.scala:51)
> com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply$mcV$sp(EventsConsumer.scala:43)
> com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply(EventsConsumer.scala:40)
> com.wixpress.greyhound.EventsConsumer$$anonfun$com$wixpress$greyhound$EventsConsumer$$dispatch$1.apply(EventsConsumer.scala:40)
> scala.util.Try$.apply(Try.scala:192)
> com.wixpress.greyhound.EventsConsumer.com$wixpress$greyhound$EventsConsumer$$dispatch(EventsConsumer.scala:40)
> com.wixpress.greyhound.EventsConsumer$$anonfun$consumeEvents$1.apply(EventsConsumer.scala:26)
> com.wixpress.greyhound.EventsConsumer$$anonfun$consumeEvents$1.apply(EventsConsumer.scala:25)
> scala.collection.Iterator$class.foreach(Iterator.scala:742)
> scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> com.wixpress.greyhound.EventsConsumer.consumeEvents(EventsConsumer.scala:25)
> com.wixpress.greyhound.EventsConsumer.run(EventsConsumer.scala:20)
> java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:617)
> java.lang.Thread.run (Thread.java:745)
> caused by com.datastax.driver.core.exceptions.WriteTimeoutException:
> Cassandra timeout during write query at consistency LOCAL_ONE (2 replica were
> required but only 1 acknowledged the write)
> …stax.driver.core.exceptions.WriteTimeoutException.copy
> (WriteTimeoutException.java:100)
> com.datastax.driver.core.Responses$Error.asException (Responses.java:98)
> com.datastax.driver.core.DefaultResultSetFuture.onSet
> (DefaultResultSetFuture.java:149)
> com.datastax.driver.core.RequestHandler.setFinalResult
> (RequestHandler.java:183)
> com.datastax.driver.core.RequestHandler.access$2300
> (RequestHandler.java:44)
> …ore.RequestHandler$SpeculativeExecution.setFinalResult
> (RequestHandler.java:748)
> ….driver.core.RequestHandler$SpeculativeExecution.onSet
> (RequestHandler.java:587)
> …atastax.driver.core.Connection$Dispatcher.channelRead0 (Connection.java:1013)
> …atastax.driver.core.Connection$Dispatcher.channelRead0 (Connection.java:936)
> ….netty.channel.SimpleChannelInboundHandler.channelRead
> (SimpleChannelInboundHandler.java:105)
> …hannel.AbstractChannelHandlerContext.invokeChannelRead
> (AbstractChannelHandlerContext.java:339)
> ….channel.AbstractChannelHandlerContext.fireChannelRead
> (AbstractChannelHandlerContext.java:324)
> io.netty.handler.timeout.IdleStateHandler.channelRead
> (IdleStateHandler.java:254)
> …hannel.AbstractChannelHandlerContext.invokeChannelRead
> (AbstractChannelHandlerContext.java:339)
> ….channel.AbstractChannelHandlerContext.fireChannelRead
> (AbstractChannelHandlerContext.java:324)
> …etty.handler.codec.MessageToMessageDecoder.channelRead
> (MessageToMessageDecoder.java:103)
> …hannel.AbstractChannelHandlerContext.invokeChannelRead
> (AbstractChannelHandlerContext.java:339)
> ….channel.AbstractChannelHandlerContext.fireChannelRead
> (AbstractChannelHandlerContext.java:324)
> …etty.handler.codec.MessageToMessageDecoder.channelRead
> (MessageToMessageDecoder.java:103)
> …hannel.AbstractChannelHandlerContext.invokeChannelRead
> (AbstractChannelHandlerContext.java:339)
> ….channel.AbstractChannelHandlerContext.fireChannelRead
> (AbstractChannelHandlerContext.java:324)
> io.netty.handler.codec.ByteToMessageDecoder.channelRead
> (ByteToMessageDecoder.java:242)
> …hannel.AbstractChannelHandlerContext.invokeChannelRead
> (AbstractChannelHandlerContext.java:339)
> ….channel.AbstractChannelHandlerContext.fireChannelRead
> (AbstractChannelHandlerContext.java:324)
> io.netty.channel.DefaultChannelPipeline.fireChannelRead
> (DefaultChannelPipeline.java:847)
> ….channel.nio.AbstractNioByteChannel$NioByteUnsafe.read
> (AbstractNioByteChannel.java:131)
> io.netty.channel.nio.NioEventLoop.processSelectedKey
> (NioEventLoop.java:511)
> ….channel.nio.NioEventLoop.processSelectedKeysOptimized
> (NioEventLoop.java:468)
> io.netty.channel.nio.NioEventLoop.processSelectedKeys
> (NioEventLoop.java:382)
> io.netty.channel.nio.NioEventLoop.run
> (NioEventLoop.java:354)
> ….netty.util.concurrent.SingleThreadEventExecutor$2.run
> (SingleThreadEventExecutor.java:111)
> java.lang.Thread.run (Thread.java:745)
> caused by com.datastax.driver.core.exceptions.WriteTimeoutException:
> Cassandra timeout during write query at consistency LOCAL_ONE (2 replica were
> required but only 1 acknowledged the write)
> com.datastax.driver.core.Responses$Error$1.decode (Responses.java:57)
> com.datastax.driver.core.Responses$Error$1.decode (Responses.java:37)
> com.datastax.driver.core.Message$ProtocolDecoder.decode (Message.java:213)
> com.datastax.driver.core.Message$ProtocolDecoder.decode (Message.java:204)
> …etty.handler.codec.MessageToMessageDecoder.channelRead
> (MessageToMessageDecoder.java:89)
> …hannel.AbstractChannelHandlerContext.invokeChannelRead
> (AbstractChannelHandlerContext.java:339)
> ….channel.AbstractChannelHandlerContext.fireChannelRead
> (AbstractChannelHandlerContext.java:324)
> …etty.handler.codec.MessageToMessageDecoder.channelRead
> (MessageToMessageDecoder.java:103)
> …hannel.AbstractChannelHandlerContext.invokeChannelRead
> (AbstractChannelHandlerContext.java:339)
> ….channel.AbstractChannelHandlerContext.fireChannelRead
> (AbstractChannelHandlerContext.java:324)
> io.netty.handler.codec.ByteToMessageDecoder.channelRead
> (ByteToMessageDecoder.java:242)
> …hannel.AbstractChannelHandlerContext.invokeChannelRead
> (AbstractChannelHandlerContext.java:339)
> ….channel.AbstractChannelHandlerContext.fireChannelRead
> (AbstractChannelHandlerContext.java:324)
> io.netty.channel.DefaultChannelPipeline.fireChannelRead
> (DefaultChannelPipeline.java:847)
> ….channel.nio.AbstractNioByteChannel$NioByteUnsafe.read
> (AbstractNioByteChannel.java:131)
> io.netty.channel.nio.NioEventLoop.processSelectedKey
> (NioEventLoop.java:511)
> ….channel.nio.NioEventLoop.processSelectedKeysOptimized
> (NioEventLoop.java:468)
> io.netty.channel.nio.NioEventLoop.processSelectedKeys
> (NioEventLoop.java:382)
> io.netty.channel.nio.NioEventLoop.run
> (NioEventLoop.java:354)
> ….netty.util.concurrent.SingleThreadEventExecutor$2.run
> (SingleThreadEventExecutor.java:111)
> java.lang.Thread.run (Thread.java:745)
> # nodetool status
> xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42
> -XX:+CMSClassUnloadingEnabled -Xms8192M -Xmx8192M -Xmn2048M -Xss256k
> Note: Ownership information does not include topology; for complete
> information, specify a keyspace
> Datacenter: AUS
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID
> Rack
> UN 172.16.213.62 85.52 GB 256 11.7%
> 27f2fd1d-5f3c-4691-a1f6-e28c1343e212 R1
> UN 172.16.213.63 83.11 GB 256 12.2%
> 4869f14b-e858-46c7-967c-60bd8260a149 R1
> UN 172.16.213.64 80.91 GB 256 11.7%
> d4ad2495-cb24-4964-94d2-9e3f557054a4 R1
> UN 172.16.213.66 84.11 GB 256 10.3%
> 2a16c0dc-c36a-4196-89df-2de4f6b6cae5 R1
> UN 172.16.144.75 95.2 GB 256 11.4%
> f87d6518-6c8e-49d9-a013-018bbedb8414 R1
> Datacenter: TAM
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID
> Rack
> UJ 10.14.0.155 4.38 GB 256 ?
> c88bebae-737b-4ade-8f79-64f655036eee R1
> UN 10.14.0.106 81.57 GB 256 10.0%
> 3b539927-b53a-4f50-9acd-d92fefbd84b9 R1
> UN 10.14.0.107 80.23 GB 256 10.4%
> b70f674d-892f-42ff-a261-5356bee79e99 R1
> UN 10.14.0.108 83.64 GB 256 11.2%
> 6e24b17a-0b48-46b4-8edb-b0a9206314a3 R1
> UN 10.14.0.109 91.02 GB 256 11.2%
> 11f02dbd-257f-4623-81f4-b94db7365775 R1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)