[jira] [Comment Edited] (CASSANDRA-12484) Unknown exception caught while attempting to update MaterializedView! findkita.kitas java.lang.AssertionErro
[ https://issues.apache.org/jira/browse/CASSANDRA-12484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064708#comment-16064708 ] ZhaoYang edited comment on CASSANDRA-12484 at 7/1/17 5:15 AM: -- [~cordlesswool] could you share you table schemas and typical queries? was (Author: jasonstack): [~cordlesswool] could you share you table schemas and typical queries? which version is fixed? > Unknown exception caught while attempting to update MaterializedView! > findkita.kitas java.lang.AssertionErro > > > Key: CASSANDRA-12484 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12484 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views > Environment: Docker Container with Cassandra version 3.7 running on > local pc >Reporter: cordlessWool >Priority: Critical > > After restart my cassandra node does not start anymore. Ends with following > error message. > ERROR 18:39:37 Unknown exception caught while attempting to update > MaterializedView! findkita.kitas > java.lang.AssertionError: We shouldn't have got there is the base row had no > associated entry > Cassandra has heavy cpu usage and use 2,1 gb of memory there is be 1gb more > available. I run nodetool cleanup and repair, but did not help. > I have 5 materialzied views on this table, but the amount of rows in table is > under 2000, that is not much. > The cassandra runs in a docker container. The container is access able, but > can not call cqlsh and my website cound not connect too -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)
[ https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070997#comment-16070997 ] Corentin Chary commented on CASSANDRA-13651: Also check: * https://github.com/netty/netty/issues/1759 * https://gist.github.com/jadbaz/47d98da0ead2e71659f343b14ef05de6 * Benchmark batching vs. stupid writeAndFlush() * It's unclear why sending the response is done in the flusher right now * https://github.com/spotify/netty-batch-flusher > Large amount of CPU used by epoll_wait(.., .., .., 0) > - > > Key: CASSANDRA-13651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13651 > Project: Cassandra > Issue Type: Bug >Reporter: Corentin Chary > Fix For: 4.x > > > I was trying to profile Cassandra under my workload and I kept seeing this > backtrace: > {code} > epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms > io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java > (native) > io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) > Native.java:111 > io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) > EpollEventLoop.java:230 > io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 > io.netty.util.concurrent.SingleThreadEventExecutor$5.run() > SingleThreadEventExecutor.java:858 > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() > DefaultThreadFactory.java:138 > java.lang.Thread.run() Thread.java:745 > {code} > At fist I though that the profiler might not be able to profile native code > properly, but I wen't further and I realized that most of the CPU was used by > {{epoll_wait()}} calls with a timeout of zero. > Here is the output of perf on this system, which confirms that most of the > overhead was with timeout == 0. > {code} > Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): > 11594448 > Overhead Trace output > > ◆ > 90.06% epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, > timeout: 0x > ▒ >5.77% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x > ▒ >1.98% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x03e8 > ▒ >0.04% epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, > timeout: 0x > ▒ >0.04% epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, > timeout: 0x > ▒ >0.03% epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, > timeout: 0x > ▒ >0.02% epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, > timeout: 0x > {code} > Running this time with perf record -ag for call traces: > {code} > # Children Self sys usr Trace output > > # > > # > 8.61% 8.61% 0.00% 8.61% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x > | > ---0x1000200af313 >| > --8.61%--0x7fca6117bdac > 0x7fca60459804 > epoll_wait > 2.98% 2.98% 0.00% 2.98% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8 > | > ---0x1000200af313 >0x7fca6117b830 >0x7fca60459804 >epoll_wait > {code} > That looks like a lot of CPU used to wait for nothing. I'm not sure if pref > reports a per-CPU percentage or a per-system percentage, but that would be > still be 10% of the total CPU usage of Cassandra at the minimum. > I went further and found the code of all that: We schedule a lot of >
[jira] [Commented] (CASSANDRA-13645) Optimize the number of replicas required in Quorum read/write
[ https://issues.apache.org/jira/browse/CASSANDRA-13645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070696#comment-16070696 ] Jay Zhuang commented on CASSANDRA-13645: Link to CASSANDRA-8119: More Expressive Consistency Levels > Optimize the number of replicas required in Quorum read/write > - > > Key: CASSANDRA-13645 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13645 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Dikang Gu >Assignee: Pengchao Wang > Fix For: 4.x > > > Currently, for C* read/write requests with quorum consistent level, number of > replicas required for quorum write is W=N/2+1, and number of replicas > required for quorum read is R=N/2+1 as well. > It works fine in odd number of replicas case, which R + W = N + 1, but in > even number of replicas case, like RF=4, 6, 8, the R+W = N + 2, which means > we are having two overlapping nodes in read/write requests, which is not > necessary. It can not provide strong consistency, but will hurts P99 read > latency a lot (2X in our production cluster). > In a lot of other database, like Amazon Aurora, they use W = N/2 + 1 and R = > N/2 for quorum requests, which will provide enough strong consistency, but > talk to one less replica in read path. "We use a quorum model with 6 votes (V > = 6), a write quorum of 4/6 (Vw = 4), and a read quorum of 3/6 (Vr = 3)." > I propose we do the same optimization, change read quorum to talk to N/2 > replicas, which should reduce the read latency for quorum read in general. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13645) Optimize the number of replicas required in Quorum read/write
[ https://issues.apache.org/jira/browse/CASSANDRA-13645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070667#comment-16070667 ] Jay Zhuang commented on CASSANDRA-13645: and {{CL.EACH_HALF}}? > Optimize the number of replicas required in Quorum read/write > - > > Key: CASSANDRA-13645 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13645 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Dikang Gu >Assignee: Pengchao Wang > Fix For: 4.x > > > Currently, for C* read/write requests with quorum consistent level, number of > replicas required for quorum write is W=N/2+1, and number of replicas > required for quorum read is R=N/2+1 as well. > It works fine in odd number of replicas case, which R + W = N + 1, but in > even number of replicas case, like RF=4, 6, 8, the R+W = N + 2, which means > we are having two overlapping nodes in read/write requests, which is not > necessary. It can not provide strong consistency, but will hurts P99 read > latency a lot (2X in our production cluster). > In a lot of other database, like Amazon Aurora, they use W = N/2 + 1 and R = > N/2 for quorum requests, which will provide enough strong consistency, but > talk to one less replica in read path. "We use a quorum model with 6 votes (V > = 6), a write quorum of 4/6 (Vw = 4), and a read quorum of 3/6 (Vr = 3)." > I propose we do the same optimization, change read quorum to talk to N/2 > replicas, which should reduce the read latency for quorum read in general. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)
[ https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070634#comment-16070634 ] Jason Brown commented on CASSANDRA-13651: - /cc [~norman] > Large amount of CPU used by epoll_wait(.., .., .., 0) > - > > Key: CASSANDRA-13651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13651 > Project: Cassandra > Issue Type: Bug >Reporter: Corentin Chary > Fix For: 4.x > > > I was trying to profile Cassandra under my workload and I kept seeing this > backtrace: > {code} > epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms > io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java > (native) > io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) > Native.java:111 > io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) > EpollEventLoop.java:230 > io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 > io.netty.util.concurrent.SingleThreadEventExecutor$5.run() > SingleThreadEventExecutor.java:858 > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() > DefaultThreadFactory.java:138 > java.lang.Thread.run() Thread.java:745 > {code} > At fist I though that the profiler might not be able to profile native code > properly, but I wen't further and I realized that most of the CPU was used by > {{epoll_wait()}} calls with a timeout of zero. > Here is the output of perf on this system, which confirms that most of the > overhead was with timeout == 0. > {code} > Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): > 11594448 > Overhead Trace output > > ◆ > 90.06% epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, > timeout: 0x > ▒ >5.77% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x > ▒ >1.98% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x03e8 > ▒ >0.04% epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, > timeout: 0x > ▒ >0.04% epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, > timeout: 0x > ▒ >0.03% epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, > timeout: 0x > ▒ >0.02% epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, > timeout: 0x > {code} > Running this time with perf record -ag for call traces: > {code} > # Children Self sys usr Trace output > > # > > # > 8.61% 8.61% 0.00% 8.61% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x > | > ---0x1000200af313 >| > --8.61%--0x7fca6117bdac > 0x7fca60459804 > epoll_wait > 2.98% 2.98% 0.00% 2.98% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8 > | > ---0x1000200af313 >0x7fca6117b830 >0x7fca60459804 >epoll_wait > {code} > That looks like a lot of CPU used to wait for nothing. I'm not sure if pref > reports a per-CPU percentage or a per-system percentage, but that would be > still be 10% of the total CPU usage of Cassandra at the minimum. > I went further and found the code of all that: We schedule a lot of > {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but > netty+epoll only support timeouts above the milliseconds and will convert > everything bellow to 0. > I added some traces to netty (4.1): > {code} > diff --git >
[jira] [Issue Comment Deleted] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline
[ https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-13649: Comment: was deleted (was: It's a netty common practice to include an exception handler at the end of a netty pipeline to handle cases like this. However, I'm reticent to add yet another handler to the pipeline as some of my testing for CASSANDRA-8457 (admittedly, very early-stage testing) showed that we spend extra time in the pipeline just by all the mechanics around invoking another handler (checking the promise, state of the channel, and so on). That being said, I can probably find some time to reinvestigate as part of finalizing all the netty-related things for 4.0. [~spo...@gmail.com] feel free to assign to me if you like, but I probably can't get to it for about a month.) > Uncaught exceptions in Netty pipeline > - > > Key: CASSANDRA-13649 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13649 > Project: Cassandra > Issue Type: Bug >Reporter: Stefan Podkowinski > Attachments: test_stdout.txt > > > I've noticed some netty related errors in trunk in [some of the dtest > results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink]. > Just want to make sure that we don't have to change anything related to the > exception handling in our pipeline and that this isn't a netty issue. > Actually if this causes flakiness but is otherwise harmless, we should do > something about it, even if it's just on the dtest side. > {noformat} > WARN [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > And again in another test: > {noformat} > WARN [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > This one looks also odd and makes > upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test fail: > {noformat} > WARN [epollEventLoopGroup-2-9] 2017-06-29 02:41:37,125 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.handler.codec.DecoderException: > org.apache.cassandra.transport.ProtocolException: Invalid or unsupported > protocol version: 4 > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:375) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:342) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at >
[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline
[ https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070620#comment-16070620 ] Jason Brown commented on CASSANDRA-13649: - It's a netty common practice to include an exception handler at the end of a netty pipeline to handle cases like this. However, I'm reticent to add yet another handler to the pipeline as some of my testing for CASSANDRA-8457 (admittedly, very early-stage testing) showed that we spend extra time in the pipeline just by all the mechanics around invoking another handler (checking the promise, state of the channel, and so on). That being said, I can probably find some time to reinvestigate as part of finalizing all the netty-related things for 4.0. [~spo...@gmail.com] feel free to assign to me if you like, but I probably can't get to it for about a month. > Uncaught exceptions in Netty pipeline > - > > Key: CASSANDRA-13649 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13649 > Project: Cassandra > Issue Type: Bug >Reporter: Stefan Podkowinski > Attachments: test_stdout.txt > > > I've noticed some netty related errors in trunk in [some of the dtest > results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink]. > Just want to make sure that we don't have to change anything related to the > exception handling in our pipeline and that this isn't a netty issue. > Actually if this causes flakiness but is otherwise harmless, we should do > something about it, even if it's just on the dtest side. > {noformat} > WARN [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > And again in another test: > {noformat} > WARN [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > This one looks also odd and makes > upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test fail: > {noformat} > WARN [epollEventLoopGroup-2-9] 2017-06-29 02:41:37,125 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.handler.codec.DecoderException: > org.apache.cassandra.transport.ProtocolException: Invalid or unsupported > protocol version: 4 > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:375) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:342) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at >
[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline
[ https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070621#comment-16070621 ] Jason Brown commented on CASSANDRA-13649: - It's a netty common practice to include an exception handler at the end of a netty pipeline to handle cases like this. However, I'm reticent to add yet another handler to the pipeline as some of my testing for CASSANDRA-8457 (admittedly, very early-stage testing) showed that we spend extra time in the pipeline just by all the mechanics around invoking another handler (checking the promise, state of the channel, and so on). That being said, I can probably find some time to reinvestigate as part of finalizing all the netty-related things for 4.0. [~spo...@gmail.com] feel free to assign to me if you like, but I probably can't get to it for about a month. > Uncaught exceptions in Netty pipeline > - > > Key: CASSANDRA-13649 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13649 > Project: Cassandra > Issue Type: Bug >Reporter: Stefan Podkowinski > Attachments: test_stdout.txt > > > I've noticed some netty related errors in trunk in [some of the dtest > results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink]. > Just want to make sure that we don't have to change anything related to the > exception handling in our pipeline and that this isn't a netty issue. > Actually if this causes flakiness but is otherwise harmless, we should do > something about it, even if it's just on the dtest side. > {noformat} > WARN [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > And again in another test: > {noformat} > WARN [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: > Connection reset by peer > at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown > Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > {noformat} > This one looks also odd and makes > upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test fail: > {noformat} > WARN [epollEventLoopGroup-2-9] 2017-06-29 02:41:37,125 Slf4JLogger.java:151 > - An exceptionCaught() event was fired, and it reached at the tail of the > pipeline. It usually means the last handler in the pipeline did not handle > the exception. > io.netty.handler.codec.DecoderException: > org.apache.cassandra.transport.ProtocolException: Invalid or unsupported > protocol version: 4 > at > io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:375) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:342) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) > ~[netty-all-4.0.44.Final.jar:4.0.44.Final] > at >
[jira] [Commented] (CASSANDRA-13645) Optimize the number of replicas required in Quorum read/write
[ https://issues.apache.org/jira/browse/CASSANDRA-13645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070608#comment-16070608 ] Jason Brown commented on CASSANDRA-13645: - To be clear, though, a user will have to know that they must use different CLs in order to gain the optimization as proposed by this ticket. Meaning, you write at {{CL.QUORUM}} and read at {{CL.HALF}}; you can't write and read at {{CL.HALF}} and get strong consistency properties. As much as I don't want to open another can of worms, but do we need a corresponding {{CL.LOCAL_HALF}}, as well? > Optimize the number of replicas required in Quorum read/write > - > > Key: CASSANDRA-13645 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13645 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Dikang Gu >Assignee: Pengchao Wang > Fix For: 4.x > > > Currently, for C* read/write requests with quorum consistent level, number of > replicas required for quorum write is W=N/2+1, and number of replicas > required for quorum read is R=N/2+1 as well. > It works fine in odd number of replicas case, which R + W = N + 1, but in > even number of replicas case, like RF=4, 6, 8, the R+W = N + 2, which means > we are having two overlapping nodes in read/write requests, which is not > necessary. It can not provide strong consistency, but will hurts P99 read > latency a lot (2X in our production cluster). > In a lot of other database, like Amazon Aurora, they use W = N/2 + 1 and R = > N/2 for quorum requests, which will provide enough strong consistency, but > talk to one less replica in read path. "We use a quorum model with 6 votes (V > = 6), a write quorum of 4/6 (Vw = 4), and a read quorum of 3/6 (Vr = 3)." > I propose we do the same optimization, change read quorum to talk to N/2 > replicas, which should reduce the read latency for quorum read in general. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)
[ https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070535#comment-16070535 ] Corentin Chary commented on CASSANDRA-13651: Things to check or try (for me): * io.netty.eventLoopThreads * Check if we could use the same eventloop instead of starting two * Create a custom SelectStrategy that skips looking at fds if there is a scheduled task happening in a few microseconds * Try to understand why Message::Flusher currently works this way > Large amount of CPU used by epoll_wait(.., .., .., 0) > - > > Key: CASSANDRA-13651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13651 > Project: Cassandra > Issue Type: Bug >Reporter: Corentin Chary > Fix For: 4.x > > > I was trying to profile Cassandra under my workload and I kept seeing this > backtrace: > {code} > epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms > io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java > (native) > io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) > Native.java:111 > io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) > EpollEventLoop.java:230 > io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 > io.netty.util.concurrent.SingleThreadEventExecutor$5.run() > SingleThreadEventExecutor.java:858 > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() > DefaultThreadFactory.java:138 > java.lang.Thread.run() Thread.java:745 > {code} > At fist I though that the profiler might not be able to profile native code > properly, but I wen't further and I realized that most of the CPU was used by > {{epoll_wait()}} calls with a timeout of zero. > Here is the output of perf on this system, which confirms that most of the > overhead was with timeout == 0. > {code} > Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): > 11594448 > Overhead Trace output > > ◆ > 90.06% epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, > timeout: 0x > ▒ >5.77% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x > ▒ >1.98% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x03e8 > ▒ >0.04% epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, > timeout: 0x > ▒ >0.04% epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, > timeout: 0x > ▒ >0.03% epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, > timeout: 0x > ▒ >0.02% epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, > timeout: 0x > {code} > Running this time with perf record -ag for call traces: > {code} > # Children Self sys usr Trace output > > # > > # > 8.61% 8.61% 0.00% 8.61% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x > | > ---0x1000200af313 >| > --8.61%--0x7fca6117bdac > 0x7fca60459804 > epoll_wait > 2.98% 2.98% 0.00% 2.98% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8 > | > ---0x1000200af313 >0x7fca6117b830 >0x7fca60459804 >epoll_wait > {code} > That looks like a lot of CPU used to wait for nothing. I'm not sure if pref > reports a per-CPU percentage or a per-system percentage, but that would be > still be 10% of the total CPU usage of Cassandra at the minimum. > I went further and found the code of all that: We schedule a lot of
[jira] [Updated] (CASSANDRA-10446) Run repair with down replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston updated CASSANDRA-10446: Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed as {{45c0f860f3c7f8e0a7c80809c4ff47f4acf65557}} > Run repair with down replicas > - > > Key: CASSANDRA-10446 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10446 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Assignee: Blake Eggleston >Priority: Minor > Fix For: 4.0 > > > We should have an option of running repair when replicas are down. We can > call it -force. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Run repair with down replicas
Repository: cassandra Updated Branches: refs/heads/trunk 176f2a444 -> 45c0f860f Run repair with down replicas Patch by Sankalp Kohli & Blake Eggleston; Reviewed by Marcus Eriksson for CASSANDRA-10446 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/45c0f860 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/45c0f860 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/45c0f860 Branch: refs/heads/trunk Commit: 45c0f860f3c7f8e0a7c80809c4ff47f4acf65557 Parents: 176f2a4 Author: Blake EgglestonAuthored: Wed Oct 12 10:14:16 2016 -0700 Committer: Blake Eggleston Committed: Fri Jun 30 11:31:15 2017 -0700 -- CHANGES.txt | 2 + .../apache/cassandra/repair/RepairRunnable.java | 12 +- .../apache/cassandra/repair/RepairSession.java | 39 ++-- .../cassandra/repair/RepairSessionResult.java | 15 +++- .../cassandra/repair/messages/RepairOption.java | 25 - .../cassandra/service/ActiveRepairService.java | 15 ++-- .../apache/cassandra/tools/nodetool/Repair.java | 4 ++ .../cassandra/repair/RepairSessionTest.java | 2 +- .../consistent/CoordinatorSessionTest.java | 2 +- .../repair/messages/RepairOptionTest.java | 22 +++ 10 files changed, 125 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/45c0f860/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 866c6fd..6444994 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,6 @@ 4.0 + * Run repair with down replicas (CASSANDRA-10446) + * Added started & completed repair metrics (CASSANDRA-13598) * Added started & completed repair metrics (CASSANDRA-13598) * Improve secondary index (re)build failure and concurrency handling (CASSANDRA-10130) * Improve calculation of available disk space for compaction (CASSANDRA-13068) http://git-wip-us.apache.org/repos/asf/cassandra/blob/45c0f860/src/java/org/apache/cassandra/repair/RepairRunnable.java -- diff --git a/src/java/org/apache/cassandra/repair/RepairRunnable.java b/src/java/org/apache/cassandra/repair/RepairRunnable.java index eca162e..29347a4 100644 --- a/src/java/org/apache/cassandra/repair/RepairRunnable.java +++ b/src/java/org/apache/cassandra/repair/RepairRunnable.java @@ -289,9 +289,18 @@ public class RepairRunnable extends WrappedRunnable implements ProgressEventNoti // filter out null(=failed) results and get successful ranges for (RepairSessionResult sessionResult : results) { +logger.debug("Repair result: {}", results); if (sessionResult != null) { -successfulRanges.addAll(sessionResult.ranges); +// don't promote sstables for sessions we skipped replicas for +if (!sessionResult.skippedReplicas) +{ +successfulRanges.addAll(sessionResult.ranges); +} +else +{ +logger.debug("Skipping anticompaction for {}", results); +} } else { @@ -424,6 +433,7 @@ public class RepairRunnable extends WrappedRunnable implements ProgressEventNoti p.left, isConsistent, options.isPullRepair(), + options.isForcedRepair(), options.getPreviewKind(), executor, cfnames); http://git-wip-us.apache.org/repos/asf/cassandra/blob/45c0f860/src/java/org/apache/cassandra/repair/RepairSession.java -- diff --git a/src/java/org/apache/cassandra/repair/RepairSession.java b/src/java/org/apache/cassandra/repair/RepairSession.java index c1b3f41..98ed1a3 100644 --- a/src/java/org/apache/cassandra/repair/RepairSession.java +++ b/src/java/org/apache/cassandra/repair/RepairSession.java @@ -36,6 +36,7 @@
[jira] [Updated] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement
[ https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-13592: - Status: Patch Available (was: In Progress) > Null Pointer exception at SELECT JSON statement > --- > > Key: CASSANDRA-13592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13592 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Debian Linux >Reporter: Wyss Philipp >Assignee: ZhaoYang > Labels: beginner > Attachments: system.log > > > A Nulll pointer exception appears when the command > {code} > SELECT JSON * FROM examples.basic; > ---MORE--- > message="java.lang.NullPointerException"> > Examples.basic has the following description (DESC examples.basic;): > CREATE TABLE examples.basic ( > key frozen> PRIMARY KEY, > wert text > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > {code} > The error appears after the ---MORE--- line. > The field "wert" has a JSON formatted string. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)
[ https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corentin Chary updated CASSANDRA-13651: --- Description: I was trying to profile Cassandra under my workload and I kept seeing this backtrace: {code} epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java (native) io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) Native.java:111 io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) EpollEventLoop.java:230 io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 io.netty.util.concurrent.SingleThreadEventExecutor$5.run() SingleThreadEventExecutor.java:858 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() DefaultThreadFactory.java:138 java.lang.Thread.run() Thread.java:745 {code} At fist I though that the profiler might not be able to profile native code properly, but I wen't further and I realized that most of the CPU was used by {{epoll_wait()}} calls with a timeout of zero. Here is the output of perf on this system, which confirms that most of the overhead was with timeout == 0. {code} Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 11594448 Overhead Trace output ◆ 90.06% epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, timeout: 0x ▒ 5.77% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, timeout: 0x ▒ 1.98% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, timeout: 0x03e8 ▒ 0.04% epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, timeout: 0x ▒ 0.04% epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, timeout: 0x ▒ 0.03% epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, timeout: 0x ▒ 0.02% epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, timeout: 0x {code} Running this time with perf record -ag for call traces: {code} # Children Self sys usr Trace output # # 8.61% 8.61% 0.00% 8.61% epfd: 0x00a7, events: 0x7fca452d6000, maxevents: 0x1000, timeout: 0x | ---0x1000200af313 | --8.61%--0x7fca6117bdac 0x7fca60459804 epoll_wait 2.98% 2.98% 0.00% 2.98% epfd: 0x00a7, events: 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8 | ---0x1000200af313 0x7fca6117b830 0x7fca60459804 epoll_wait {code} That looks like a lot of CPU used to wait for nothing. I'm not sure if pref reports a per-CPU percentage or a per-system percentage, but that would be still be 10% of the total CPU usage of Cassandra at the minimum. I went further and found the code of all that: We schedule a lot of {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but netty+epoll only support timeouts above the milliseconds and will convert everything bellow to 0. I added some traces to netty (4.1): {code} diff --git a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java index 909088fde..8734bbfd4 100644 --- a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java +++ b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java @@ -208,10 +208,15 @@ final class EpollEventLoop extends SingleThreadEventLoop { long currentTimeNanos = System.nanoTime(); long selectDeadLineNanos = currentTimeNanos + delayNanos(currentTimeNanos); for (;;)
[jira] [Created] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)
Corentin Chary created CASSANDRA-13651: -- Summary: Large amount of CPU used by epoll_wait(.., .., .., 0) Key: CASSANDRA-13651 URL: https://issues.apache.org/jira/browse/CASSANDRA-13651 Project: Cassandra Issue Type: Bug Reporter: Corentin Chary Fix For: 4.x I was trying to profile Cassandra under my workload and I kept seeing this backtrace: {code} epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java (native) io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) Native.java:111 io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) EpollEventLoop.java:230 io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 io.netty.util.concurrent.SingleThreadEventExecutor$5.run() SingleThreadEventExecutor.java:858 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() DefaultThreadFactory.java:138 java.lang.Thread.run() Thread.java:745 {code} At fist I though that the profiler might not be able to profile native code properly, but I wen't further and I realized that most of the CPU was used by epoll_wait() calls with a timeout of zero. Here is the output of perf on this system, which confirms that most of the overhead was with timeout == 0. {code} Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 11594448 Overhead Trace output ◆ 90.06% epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, timeout: 0x ▒ 5.77% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, timeout: 0x ▒ 1.98% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, timeout: 0x03e8 ▒ 0.04% epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, timeout: 0x ▒ 0.04% epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, timeout: 0x ▒ 0.03% epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, timeout: 0x ▒ 0.02% epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, timeout: 0x {code} Running this time with perf record -ag for call traces: {code} # Children Self sys usr Trace output # # 8.61% 8.61% 0.00% 8.61% epfd: 0x00a7, events: 0x7fca452d6000, maxevents: 0x1000, timeout: 0x | ---0x1000200af313 | --8.61%--0x7fca6117bdac 0x7fca60459804 epoll_wait 2.98% 2.98% 0.00% 2.98% epfd: 0x00a7, events: 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8 | ---0x1000200af313 0x7fca6117b830 0x7fca60459804 epoll_wait {code} That looks like a lot of CPU used to wait for nothing. I'm not sure if pref reports a per-CPU percentage or a per-system percentage, but that would be still be 10% of the total CPU usage of Cassandra at the minimum. I went further and found the code of all that: We schedule a lot of Message::Flusher with a deadline of 10 usec (5 per messages I think) but netty+epoll only support timeouts above the milliseconds and will convert everything bellow to 0. I added some traces to netty (4.1): {code} diff --git a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java index 909088fde..8734bbfd4 100644 --- a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java +++ b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java @@ -208,10 +208,15 @@ final class EpollEventLoop extends SingleThreadEventLoop
[jira] [Updated] (CASSANDRA-13650) cql_tests:SlowQueryTester.local_query_test and cql_tests:SlowQueryTester.remote_query_test failed on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-13650: - Description: cql_tests.py:SlowQueryTester.local_query_test failed on trunk cql_tests.py:SlowQueryTester.remote_query_test failed on trunk SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217 It's due to the dtest unable to find {{'SELECT \* FROM ks.test1'}} pattern from log. but in the log, following info is showed: {{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: , time 102 msec - slow timeout 10 msec}} ColumnFilter.toString() should return {{*}}, but return normal column {{val}} instead was: cql_tests.py:SlowQueryTester.local_query_test failed on trunk cql_tests.py:SlowQueryTester.remote_query_test failed on trunk SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217 The cause seems due to the dtest unable to find {{"SELECT \* FROM ks.test1"}} pattern from log. but in the log, following info is showed: {{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: , time 102 msec - slow timeout 10 msec}} ColumnFilter.toString() should return {{*}}, but return normal column {{val}} instead > cql_tests:SlowQueryTester.local_query_test and > cql_tests:SlowQueryTester.remote_query_test failed on trunk > -- > > Key: CASSANDRA-13650 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13650 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: ZhaoYang >Assignee: ZhaoYang > Fix For: 4.x > > > cql_tests.py:SlowQueryTester.local_query_test failed on trunk > cql_tests.py:SlowQueryTester.remote_query_test failed on trunk > SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217 > It's due to the dtest unable to find {{'SELECT \* FROM ks.test1'}} pattern > from log. > but in the log, following info is showed: > {{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: > , time 102 msec - slow timeout 10 msec}} > ColumnFilter.toString() should return {{*}}, but return normal column {{val}} > instead -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13650) cql_tests:SlowQueryTester.local_query_test and cql_tests:SlowQueryTester.remote_query_test failed on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-13650: - Summary: cql_tests:SlowQueryTester.local_query_test and cql_tests:SlowQueryTester.remote_query_test failed on trunk (was: cql_tests.py:SlowQueryTester.local_query_test and cql_tests.py:SlowQueryTester.remote_query_test failed on trunk) > cql_tests:SlowQueryTester.local_query_test and > cql_tests:SlowQueryTester.remote_query_test failed on trunk > -- > > Key: CASSANDRA-13650 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13650 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: ZhaoYang >Assignee: ZhaoYang > Fix For: 4.x > > > cql_tests.py:SlowQueryTester.local_query_test failed on trunk > cql_tests.py:SlowQueryTester.remote_query_test failed on trunk > SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217 > The cause seems due to the dtest unable to find {{"SELECT \* FROM ks.test1"}} > pattern from log. > but in the log, following info is showed: > {{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: > , time 102 msec - slow timeout 10 msec}} > ColumnFilter.toString() should return {{*}}, but return normal column {{val}} > instead -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement
[ https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069843#comment-16069843 ] ZhaoYang commented on CASSANDRA-13592: -- I have created [ticket|https://issues.apache.org/jira/browse/CASSANDRA-13650] for {{cql_tests.py:SlowQueryTester.local_query_test}} & {{cql_tests.py:SlowQueryTester.remote_query_test}} > Null Pointer exception at SELECT JSON statement > --- > > Key: CASSANDRA-13592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13592 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Debian Linux >Reporter: Wyss Philipp >Assignee: ZhaoYang > Labels: beginner > Attachments: system.log > > > A Nulll pointer exception appears when the command > {code} > SELECT JSON * FROM examples.basic; > ---MORE--- > message="java.lang.NullPointerException"> > Examples.basic has the following description (DESC examples.basic;): > CREATE TABLE examples.basic ( > key frozen> PRIMARY KEY, > wert text > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > {code} > The error appears after the ---MORE--- line. > The field "wert" has a JSON formatted string. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13650) cql_tests.py:SlowQueryTester.local_query_test and cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang updated CASSANDRA-13650: - Description: cql_tests.py:SlowQueryTester.local_query_test failed on trunk cql_tests.py:SlowQueryTester.remote_query_test failed on trunk SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217 The cause seems due to the dtest unable to find {{"SELECT \* FROM ks.test1"}} pattern from log. but in the log, following info is showed: {{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: , time 102 msec - slow timeout 10 msec}} ColumnFilter.toString() should return {{*}}, but return normal column {{val}} instead was: cql_tests.py:SlowQueryTester.local_query_test failed on trunk cql_tests.py:SlowQueryTester.remote_query_test failed on trunk SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217 The cause seems due to the dtest unable to find "SELECT \* FROM ks.test1" pattern from log. but in the log, following info is showed: {{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: , time 102 msec - slow timeout 10 msec}} ColumnFilter.toString() should return "*", but return normal column "val" instead > cql_tests.py:SlowQueryTester.local_query_test and > cql_tests.py:SlowQueryTester.remote_query_test failed on trunk > > > Key: CASSANDRA-13650 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13650 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata >Reporter: ZhaoYang >Assignee: ZhaoYang > Fix For: 4.x > > > cql_tests.py:SlowQueryTester.local_query_test failed on trunk > cql_tests.py:SlowQueryTester.remote_query_test failed on trunk > SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217 > The cause seems due to the dtest unable to find {{"SELECT \* FROM ks.test1"}} > pattern from log. > but in the log, following info is showed: > {{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: > , time 102 msec - slow timeout 10 msec}} > ColumnFilter.toString() should return {{*}}, but return normal column {{val}} > instead -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13650) cql_tests.py:SlowQueryTester.local_query_test and cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
ZhaoYang created CASSANDRA-13650: Summary: cql_tests.py:SlowQueryTester.local_query_test and cql_tests.py:SlowQueryTester.remote_query_test failed on trunk Key: CASSANDRA-13650 URL: https://issues.apache.org/jira/browse/CASSANDRA-13650 Project: Cassandra Issue Type: Bug Components: Distributed Metadata Reporter: ZhaoYang Assignee: ZhaoYang Fix For: 4.x cql_tests.py:SlowQueryTester.local_query_test failed on trunk cql_tests.py:SlowQueryTester.remote_query_test failed on trunk SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217 The cause seems due to the dtest unable to find "SELECT \* FROM ks.test1" pattern from log. but in the log, following info is showed: {{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: , time 102 msec - slow timeout 10 msec}} ColumnFilter.toString() should return "*", but return normal column "val" instead -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement
[ https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069565#comment-16069565 ] ZhaoYang edited comment on CASSANDRA-13592 at 6/30/17 9:47 AM: --- || source || junit-result || dtest-result|| | [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | [junit|https://circleci.com/gh/jasonstack/cassandra/84] | {{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk {{cql_tests.py:SlowQueryTester.remote_query_test}} failed on trunk {{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506] | | [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11] | [junit|https://circleci.com/gh/jasonstack/cassandra/82] | {{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229] {{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} [known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | | [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0] | [junit|https://circleci.com/gh/jasonstack/cassandra/83] | {{auth_test.TestAuth.system_auth_ks_is_alterable_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13113] {{offline_tools_test.TestOfflineTools.sstableofflinerelevel_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-12617] {{repair_tests.incremental_repair_test.TestIncRepair.multiple_repair_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13515]| | [2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2] | [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer position the same. 2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with double-quote) to be consistent with user json input 3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, otherwise parent method throws NPE. was (Author: jasonstack): || source || junit-result || dtest-result|| | [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | [junit|https://circleci.com/gh/jasonstack/cassandra/84] | {{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk {{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506] | | [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11] | [junit|https://circleci.com/gh/jasonstack/cassandra/82] | {{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229] {{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} [known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | | [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0] | [junit|https://circleci.com/gh/jasonstack/cassandra/83] | {{auth_test.TestAuth.system_auth_ks_is_alterable_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13113] {{offline_tools_test.TestOfflineTools.sstableofflinerelevel_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-12617] {{repair_tests.incremental_repair_test.TestIncRepair.multiple_repair_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13515]| | [2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2] | [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer position the same. 2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with double-quote) to be consistent with user json input 3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, otherwise parent method throws NPE. > Null Pointer exception at SELECT JSON statement > --- > > Key: CASSANDRA-13592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13592 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Debian Linux >Reporter: Wyss Philipp >Assignee: ZhaoYang > Labels: beginner > Attachments: system.log > > > A Nulll pointer exception appears when the command > {code} > SELECT JSON * FROM examples.basic; > ---MORE--- > message="java.lang.NullPointerException"> > Examples.basic has the following description (DESC examples.basic;): > CREATE TABLE examples.basic ( > key frozen> PRIMARY KEY, > wert text > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': >
[jira] [Updated] (CASSANDRA-13598) Started & Completed repair metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-13598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-13598: --- Attachment: 13598-3.11.patch 13598-3.0.patch > Started & Completed repair metrics > -- > > Key: CASSANDRA-13598 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13598 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Assignee: Cameron Zemek >Priority: Minor > Labels: repair > Fix For: 4.0 > > Attachments: 13598-3.0.patch, 13598-3.11.patch > > > There are no metrics to monitor repairs run as co-ordinator. A number of > metrics were added with CASSANDRA-13531 but didn't include metrics to monitor > if repair is running or how many repairs have ran. > |4.x|[patch|https://github.com/apache/cassandra/compare/instaclustr:trunk...instaclustr:13598-4.x]| > |3.11|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-3.11...instaclustr:13598-3.11]| > |3.0|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-3.0...instaclustr:13598-3.0]| > |2.2|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-2.2...instaclustr:13598-2.2]| -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13598) Started & Completed repair metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-13598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-13598: --- Resolution: Fixed Fix Version/s: (was: 4.x) 4.0 Status: Resolved (was: Patch Available) Failed tests look unrelated. I've also quickly tested locally and realized that we really should name these RepairJobsStarted/Completed instead of just RepairsStarted/Completed, as the latest implies that something has been actually repaired, which doesn't have to be the case. Also RepairJobsStarted/Completed may make it more obvious that this is only on the coordinator. I've commited this to 4.0 as 176f2a444cd, since this really doesn't qualify as a bug fix. I've attached patches for backports. Thanks for the contribution, Cameron! > Started & Completed repair metrics > -- > > Key: CASSANDRA-13598 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13598 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Assignee: Cameron Zemek >Priority: Minor > Labels: repair > Fix For: 4.0 > > > There are no metrics to monitor repairs run as co-ordinator. A number of > metrics were added with CASSANDRA-13531 but didn't include metrics to monitor > if repair is running or how many repairs have ran. > |4.x|[patch|https://github.com/apache/cassandra/compare/instaclustr:trunk...instaclustr:13598-4.x]| > |3.11|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-3.11...instaclustr:13598-3.11]| > |3.0|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-3.0...instaclustr:13598-3.0]| > |2.2|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-2.2...instaclustr:13598-2.2]| -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Add started & completed repair metrics
Repository: cassandra Updated Branches: refs/heads/trunk fe3cfe3d7 -> 176f2a444 Add started & completed repair metrics patch by Cameron Zemek; reviewed by Stefan Podkowinski for CASSANDRA-13598 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/176f2a44 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/176f2a44 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/176f2a44 Branch: refs/heads/trunk Commit: 176f2a444cd2a6ed7c3be6ac126b6ca2c4f255cf Parents: fe3cfe3 Author: Cameron ZemekAuthored: Wed Jun 14 14:06:53 2017 +1000 Committer: Stefan Podkowinski Committed: Fri Jun 30 11:28:13 2017 +0200 -- CHANGES.txt | 1 + .../apache/cassandra/metrics/KeyspaceMetrics.java | 18 ++ .../apache/cassandra/metrics/TableMetrics.java| 7 +++ .../org/apache/cassandra/repair/RepairJob.java| 6 ++ 4 files changed, 32 insertions(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/176f2a44/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index e56eb78..866c6fd 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Added started & completed repair metrics (CASSANDRA-13598) * Improve secondary index (re)build failure and concurrency handling (CASSANDRA-10130) * Improve calculation of available disk space for compaction (CASSANDRA-13068) * Change the accessibility of RowCacheSerializer for third party row cache plugins (CASSANDRA-13579) http://git-wip-us.apache.org/repos/asf/cassandra/blob/176f2a44/src/java/org/apache/cassandra/metrics/KeyspaceMetrics.java -- diff --git a/src/java/org/apache/cassandra/metrics/KeyspaceMetrics.java b/src/java/org/apache/cassandra/metrics/KeyspaceMetrics.java index affb372..9e8d542 100644 --- a/src/java/org/apache/cassandra/metrics/KeyspaceMetrics.java +++ b/src/java/org/apache/cassandra/metrics/KeyspaceMetrics.java @@ -102,6 +102,10 @@ public class KeyspaceMetrics public final Counter speculativeFailedRetries; /** Needed to speculate, but didn't have enough replicas **/ public final Counter speculativeInsufficientReplicas; +/** Number of started repairs as coordinator on this keyspace */ +public final Counter repairsStarted; +/** Number of completed repairs as coordinator on this keyspace */ +public final Counter repairsCompleted; /** total time spent as a repair coordinator */ public final Timer repairTime; /** total time spent preparing for repair */ @@ -285,6 +289,20 @@ public class KeyspaceMetrics return metric.speculativeInsufficientReplicas.getCount(); } }); +repairsStarted = createKeyspaceCounter("RepairJobsStarted", new MetricValue() +{ +public Long getValue(TableMetrics metric) +{ +return metric.repairsStarted.getCount(); +} +}); +repairsCompleted = createKeyspaceCounter("RepairJobsCompleted", new MetricValue() +{ +public Long getValue(TableMetrics metric) +{ +return metric.repairsCompleted.getCount(); +} +}); repairTime = Metrics.timer(factory.createMetricName("RepairTime")); repairPrepareTime = Metrics.timer(factory.createMetricName("RepairPrepareTime")); anticompactionTime = Metrics.timer(factory.createMetricName("AntiCompactionTime")); http://git-wip-us.apache.org/repos/asf/cassandra/blob/176f2a44/src/java/org/apache/cassandra/metrics/TableMetrics.java -- diff --git a/src/java/org/apache/cassandra/metrics/TableMetrics.java b/src/java/org/apache/cassandra/metrics/TableMetrics.java index 40a927f..98fd1e9 100644 --- a/src/java/org/apache/cassandra/metrics/TableMetrics.java +++ b/src/java/org/apache/cassandra/metrics/TableMetrics.java @@ -146,6 +146,10 @@ public class TableMetrics public final LatencyMetrics casCommit; /** percent of the data that is repaired */ public final Gauge percentRepaired; +/** Number of started repairs as coordinator on this table */ +public final Counter repairsStarted; +/** Number of completed repairs as coordinator on this table */ +public final Counter repairsCompleted; /** time spent anticompacting data before participating in a consistent repair */ public final TableTimer anticompactionTime; /** time spent creating merkle trees */ @@ -723,6 +727,9 @@ public class TableMetrics casPropose = new LatencyMetrics(factory, "CasPropose",
[jira] [Comment Edited] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement
[ https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069565#comment-16069565 ] ZhaoYang edited comment on CASSANDRA-13592 at 6/30/17 9:23 AM: --- || source || junit-result || dtest-result|| | [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | [junit|https://circleci.com/gh/jasonstack/cassandra/84] | {{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk {{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506] | | [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11] | [junit|https://circleci.com/gh/jasonstack/cassandra/82] | {{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229] {{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} [known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | | [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0] | [junit|https://circleci.com/gh/jasonstack/cassandra/83] | {{auth_test.TestAuth.system_auth_ks_is_alterable_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13113] {{offline_tools_test.TestOfflineTools.sstableofflinerelevel_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-12617] {{repair_tests.incremental_repair_test.TestIncRepair.multiple_repair_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13515]| | [2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2] | [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer position the same. 2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with double-quote) to be consistent with user json input 3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, otherwise parent method throws NPE. was (Author: jasonstack): || source || junit-result || dtest-result|| | [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | [junit|https://circleci.com/gh/jasonstack/cassandra/84] | {{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk {{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506] | | [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11] | [junit|https://circleci.com/gh/jasonstack/cassandra/82] | {{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229] {{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} [known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | | [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0] | [junit|https://circleci.com/gh/jasonstack/cassandra/83] | {{auth_test.TestAuth.system_auth_ks_is_alterable_test}} [known|https://issues.apache.org/jira/browse/CASSANDRA-13113]{{offline_tools_test.TestOfflineTools.sstableofflinerelevel_test}} [known|https://issues.apache.org/jira/browse/CASSANDRA-12617] {{repair_tests.incremental_repair_test.TestIncRepair.multiple_repair_test }} | [known|https://issues.apache.org/jira/browse/CASSANDRA-13515]| | [2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2] | [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer position the same. 2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with double-quote) to be consistent with user json input 3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, otherwise parent method throws NPE. > Null Pointer exception at SELECT JSON statement > --- > > Key: CASSANDRA-13592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13592 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Debian Linux >Reporter: Wyss Philipp >Assignee: ZhaoYang > Labels: beginner > Attachments: system.log > > > A Nulll pointer exception appears when the command > {code} > SELECT JSON * FROM examples.basic; > ---MORE--- > message="java.lang.NullPointerException"> > Examples.basic has the following description (DESC examples.basic;): > CREATE TABLE examples.basic ( > key frozen> PRIMARY KEY, > wert text > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'}
[jira] [Comment Edited] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement
[ https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069565#comment-16069565 ] ZhaoYang edited comment on CASSANDRA-13592 at 6/30/17 9:20 AM: --- || source || junit-result || dtest-result|| | [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | [junit|https://circleci.com/gh/jasonstack/cassandra/84] | {{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk {{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506] | | [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11] | [junit|https://circleci.com/gh/jasonstack/cassandra/82] | {{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229] {{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} [known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | | [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0] | [junit|https://circleci.com/gh/jasonstack/cassandra/83] | {{auth_test.TestAuth.system_auth_ks_is_alterable_test}} [known|https://issues.apache.org/jira/browse/CASSANDRA-13113]{{offline_tools_test.TestOfflineTools.sstableofflinerelevel_test}} [known|https://issues.apache.org/jira/browse/CASSANDRA-12617] {{repair_tests.incremental_repair_test.TestIncRepair.multiple_repair_test }} | [known|https://issues.apache.org/jira/browse/CASSANDRA-13515]| | [2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2] | [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer position the same. 2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with double-quote) to be consistent with user json input 3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, otherwise parent method throws NPE. was (Author: jasonstack): || source || junit-result || dtest-result|| | [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | [junit|https://circleci.com/gh/jasonstack/cassandra/84] | {{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk {{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506] | | [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11] | [junit|https://circleci.com/gh/jasonstack/cassandra/82] | {{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229] {{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} [known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | | [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0] | [junit|https://circleci.com/gh/jasonstack/cassandra/83] | | | [2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2] | [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer position the same. 2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with double-quote) to be consistent with user json input 3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, otherwise parent method throws NPE. > Null Pointer exception at SELECT JSON statement > --- > > Key: CASSANDRA-13592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13592 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Debian Linux >Reporter: Wyss Philipp >Assignee: ZhaoYang > Labels: beginner > Attachments: system.log > > > A Nulll pointer exception appears when the command > {code} > SELECT JSON * FROM examples.basic; > ---MORE--- > message="java.lang.NullPointerException"> > Examples.basic has the following description (DESC examples.basic;): > CREATE TABLE examples.basic ( > key frozen> PRIMARY KEY, > wert text > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 >
[jira] [Assigned] (CASSANDRA-13162) Batchlog replay is throttled during bootstrap, creating conditions for incorrect query results on materialized views
[ https://issues.apache.org/jira/browse/CASSANDRA-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrés de la Peña reassigned CASSANDRA-13162: - Assignee: Andrés de la Peña > Batchlog replay is throttled during bootstrap, creating conditions for > incorrect query results on materialized views > > > Key: CASSANDRA-13162 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13162 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views >Reporter: Wei Deng >Assignee: Andrés de la Peña >Priority: Critical > Labels: bootstrap, materializedviews > > I've tested this in a C* 3.0 cluster with a couple of Materialized Views > defined (one base table and two MVs on that base table). The data volume is > not very high per node (about 80GB of data per node total, and that > particular base table has about 25GB of data uncompressed with one MV taking > 18GB compressed and the other MV taking 3GB), and the cluster is using decent > hardware (EC2 C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB > gp2 EBS volumes). > This is originally a 9-node cluster. It appears that after adding 3 more > nodes to the DC, the system.batches table accumulated a lot of data on the 3 > new nodes (each having around 20GB under system.batches directory), and in > the subsequent week the batchlog on the 3 new nodes got slowly replayed back > to the rest of the nodes in the cluster. The bottleneck seems to be the > throttling defined in this cassandra.yaml setting: > batchlog_replay_throttle_in_kb, which by default is set to 1MB/s. > Given that it is taking almost a week (and still hasn't finished) for the > batchlog (from MV) to be replayed after the boostrap finishes, it seems only > reasonable to unthrottle (or at least give it a much higher throttle rate) > during the initial bootstrap, and hence I'd consider this a bug for our > current MV implementation. > Also as far as I understand, the bootstrap logic won't wait for the > backlogged batchlog to be fully replayed before changing the new > bootstrapping node to "UN" state, and if batchlog for the MVs got stuck in > this state for a long time, we basically will get wrong answers on the MVs > during that whole duration (until batchlog is fully played to the cluster), > which adds even more criticality to this bug. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13629) Wait for batchlog replay during bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069690#comment-16069690 ] Andrés de la Peña commented on CASSANDRA-13629: --- [CASSANDRA-13065|https://issues.apache.org/jira/browse/CASSANDRA-13065], which was considered an improvement, solves this problem only for 4.0. If now we see it as a bug fix we might want to port it back to other branches. [~pauloricardomg], what do you think? > Wait for batchlog replay during bootstrap > - > > Key: CASSANDRA-13629 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13629 > Project: Cassandra > Issue Type: Sub-task > Components: Materialized Views >Reporter: Andrés de la Peña >Assignee: Andrés de la Peña > Fix For: 4.0 > > > As part of the problem described in > [CASSANDRA-13162|https://issues.apache.org/jira/browse/CASSANDRA-13162], the > bootstrap logic won't wait for the backlogged batchlog to be fully replayed > before changing the new bootstrapping node to "UN" state. We should wait for > batchlog replay before making the node available. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline
[ https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-13649: --- Description: I've noticed some netty related errors in trunk in [some of the dtest results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink]. Just want to make sure that we don't have to change anything related to the exception handling in our pipeline and that this isn't a netty issue. Actually if this causes flakiness but is otherwise harmless, we should do something about it, even if it's just on the dtest side. {noformat} WARN [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] {noformat} And again in another test: {noformat} WARN [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] {noformat} This one looks also odd and makes upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test fail: {noformat} WARN [epollEventLoopGroup-2-9] 2017-06-29 02:41:37,125 Slf4JLogger.java:151 - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. io.netty.handler.codec.DecoderException: org.apache.cassandra.transport.ProtocolException: Invalid or unsupported protocol version: 4 at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:375) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:342) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:893) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:691) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) [netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:307) [netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) [netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) [netty-all-4.0.44.Final.jar:4.0.44.Final] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] Caused by: org.apache.cassandra.transport.ProtocolException: Invalid or unsupported protocol version: 4 at org.apache.cassandra.transport.Frame$Decoder.decode(Frame.java:186) ~[main/:na] at
[jira] [Updated] (CASSANDRA-13629) Wait for batchlog replay during bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrés de la Peña updated CASSANDRA-13629: -- Fix Version/s: 4.0 > Wait for batchlog replay during bootstrap > - > > Key: CASSANDRA-13629 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13629 > Project: Cassandra > Issue Type: Sub-task > Components: Materialized Views >Reporter: Andrés de la Peña >Assignee: Andrés de la Peña > Fix For: 4.0 > > > As part of the problem described in > [CASSANDRA-13162|https://issues.apache.org/jira/browse/CASSANDRA-13162], the > bootstrap logic won't wait for the backlogged batchlog to be fully replayed > before changing the new bootstrapping node to "UN" state. We should wait for > batchlog replay before making the node available. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13629) Wait for batchlog replay during bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrés de la Peña updated CASSANDRA-13629: -- Resolution: Not A Problem Status: Resolved (was: Awaiting Feedback) > Wait for batchlog replay during bootstrap > - > > Key: CASSANDRA-13629 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13629 > Project: Cassandra > Issue Type: Sub-task > Components: Materialized Views >Reporter: Andrés de la Peña >Assignee: Andrés de la Peña > > As part of the problem described in > [CASSANDRA-13162|https://issues.apache.org/jira/browse/CASSANDRA-13162], the > bootstrap logic won't wait for the backlogged batchlog to be fully replayed > before changing the new bootstrapping node to "UN" state. We should wait for > batchlog replay before making the node available. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13629) Wait for batchlog replay during bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069680#comment-16069680 ] Andrés de la Peña edited comment on CASSANDRA-13629 at 6/30/17 8:15 AM: It seems that since [CASSANDRA-13065|https://issues.apache.org/jira/browse/CASSANDRA-13065] the data received during bootstrap is not sent to batchlog. Since the batchlog is empty when bootstrap finishes, this ticket is not necessary. was (Author: adelapena): It sees that since [CASSANDRA-13065|https://issues.apache.org/jira/browse/CASSANDRA-13065] the data received during bootstrap is not sent to batchlog. Since the batchlog is empty when bootstrap finishes, this ticket is not necessary. > Wait for batchlog replay during bootstrap > - > > Key: CASSANDRA-13629 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13629 > Project: Cassandra > Issue Type: Sub-task > Components: Materialized Views >Reporter: Andrés de la Peña >Assignee: Andrés de la Peña > > As part of the problem described in > [CASSANDRA-13162|https://issues.apache.org/jira/browse/CASSANDRA-13162], the > bootstrap logic won't wait for the backlogged batchlog to be fully replayed > before changing the new bootstrapping node to "UN" state. We should wait for > batchlog replay before making the node available. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-13565) Materialized view usage of commit logs requires large mutation but commitlog_segment_size_in_mb=2048 causes exception
[ https://issues.apache.org/jira/browse/CASSANDRA-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhaoYang resolved CASSANDRA-13565. -- Resolution: Duplicate > Materialized view usage of commit logs requires large mutation but > commitlog_segment_size_in_mb=2048 causes exception > - > > Key: CASSANDRA-13565 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13565 > Project: Cassandra > Issue Type: Bug > Components: Configuration, Materialized Views, Streaming and > Messaging > Environment: Cassandra 3.9.0, Windows >Reporter: Tania S Engel > Attachments: CQLforTable.png > > > We will be upgrading to 3.10 for CASSANDRA-11670. However, there is another > scenario (not applyunsafe during JOIN) which leads to : > java.lang.IllegalArgumentException: Mutation of 525.847MiB is too large > for the maximum size of 512.000MiB > at > org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.Mutation.apply(Mutation.java:227) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.batchlog.BatchlogManager.store(BatchlogManager.java:147) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:797) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.view.ViewBuilder.buildKey(ViewBuilder.java:96) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.view.ViewBuilder.run(ViewBuilder.java:165) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1591) > [apache-cassandra-3.9.0.jar:3.9.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_66] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [na:1.8.0_66] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_66] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_66] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] > Due to the relationship of max_mutation_size_in_kb and > commitlog_segment_size_in_mb, we increased commitlog_segment_size_in_mb and > left Cassandra to calculate max_mutation_size_in_kb as half the size > commitlog_segment_size_in_mb * 1024. > However, we have found that if we set commitlog_segment_size_in_mb=2048 we > get an exception upon starting Cassandra, when it is creating a new commit > log. > ERROR [COMMIT-LOG-ALLOCATOR] 2017-05-31 17:01:48,005 > JVMStabilityInspector.java:82 - Exiting due to error while processing commit > log during initialization. > org.apache.cassandra.io.FSWriteError: java.io.IOException: An attempt was > made to move the file pointer before the beginning of the file > Perhaps the index you are using is not big enough and it goes negative. > Is the relationship between max_mutation_size_in_kb and > commitlog_segment_size_in_mb important to preserve? In our limited stress > test we are finding mutation size already over 512mb and we expect more data > in our sstables and associated materialized views. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13629) Wait for batchlog replay during bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069680#comment-16069680 ] Andrés de la Peña commented on CASSANDRA-13629: --- It sees that since [CASSANDRA-13065|https://issues.apache.org/jira/browse/CASSANDRA-13065] the data received during bootstrap is not sent to batchlog. Since the batchlog is empty when bootstrap finishes, this ticket is not necessary. > Wait for batchlog replay during bootstrap > - > > Key: CASSANDRA-13629 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13629 > Project: Cassandra > Issue Type: Sub-task > Components: Materialized Views >Reporter: Andrés de la Peña >Assignee: Andrés de la Peña > > As part of the problem described in > [CASSANDRA-13162|https://issues.apache.org/jira/browse/CASSANDRA-13162], the > bootstrap logic won't wait for the backlogged batchlog to be fully replayed > before changing the new bootstrapping node to "UN" state. We should wait for > batchlog replay before making the node available. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13565) Materialized view usage of commit logs requires large mutation but commitlog_segment_size_in_mb=2048 causes exception
[ https://issues.apache.org/jira/browse/CASSANDRA-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069678#comment-16069678 ] ZhaoYang commented on CASSANDRA-13565: -- I will mark this ticket as `not an issue` and 13622 is better place to fix all boundary cases. > Materialized view usage of commit logs requires large mutation but > commitlog_segment_size_in_mb=2048 causes exception > - > > Key: CASSANDRA-13565 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13565 > Project: Cassandra > Issue Type: Bug > Components: Configuration, Materialized Views, Streaming and > Messaging > Environment: Cassandra 3.9.0, Windows >Reporter: Tania S Engel > Attachments: CQLforTable.png > > > We will be upgrading to 3.10 for CASSANDRA-11670. However, there is another > scenario (not applyunsafe during JOIN) which leads to : > java.lang.IllegalArgumentException: Mutation of 525.847MiB is too large > for the maximum size of 512.000MiB > at > org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.Mutation.apply(Mutation.java:227) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.batchlog.BatchlogManager.store(BatchlogManager.java:147) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:797) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.view.ViewBuilder.buildKey(ViewBuilder.java:96) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.view.ViewBuilder.run(ViewBuilder.java:165) > ~[apache-cassandra-3.9.0.jar:3.9.0] > at > org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1591) > [apache-cassandra-3.9.0.jar:3.9.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_66] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [na:1.8.0_66] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_66] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_66] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] > Due to the relationship of max_mutation_size_in_kb and > commitlog_segment_size_in_mb, we increased commitlog_segment_size_in_mb and > left Cassandra to calculate max_mutation_size_in_kb as half the size > commitlog_segment_size_in_mb * 1024. > However, we have found that if we set commitlog_segment_size_in_mb=2048 we > get an exception upon starting Cassandra, when it is creating a new commit > log. > ERROR [COMMIT-LOG-ALLOCATOR] 2017-05-31 17:01:48,005 > JVMStabilityInspector.java:82 - Exiting due to error while processing commit > log during initialization. > org.apache.cassandra.io.FSWriteError: java.io.IOException: An attempt was > made to move the file pointer before the beginning of the file > Perhaps the index you are using is not big enough and it goes negative. > Is the relationship between max_mutation_size_in_kb and > commitlog_segment_size_in_mb important to preserve? In our limited stress > test we are finding mutation size already over 512mb and we expect more data > in our sstables and associated materialized views. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline
Stefan Podkowinski created CASSANDRA-13649: -- Summary: Uncaught exceptions in Netty pipeline Key: CASSANDRA-13649 URL: https://issues.apache.org/jira/browse/CASSANDRA-13649 Project: Cassandra Issue Type: Bug Reporter: Stefan Podkowinski Attachments: test_stdout.txt I've noticed some netty related errors in trunk in [some of the dtest results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink]. Just want to make sure that we don't have to change anything related to the exception handling in our pipeline and that this isn't a netty issue. Actually if this causes flakiness but is otherwise harmless, we should do something about it, even if it's just on the dtest side. {noformat} WARN [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] {noformat} And again in another test: {noformat} WARN [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: Connection reset by peer at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] {noformat} This one looks also odd and makes upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test fail: {noformat} WARN [epollEventLoopGroup-2-9] 2017-06-29 02:41:37,125 Slf4JLogger.java:151 - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception. io.netty.handler.codec.DecoderException: org.apache.cassandra.transport.ProtocolException: Invalid or unsupported protocol version: 4 at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:375) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:342) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:893) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:691) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) [netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:307) [netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) [netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) [netty-all-4.0.44.Final.jar:4.0.44.Final] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] Caused by: org.apache.cassandra.transport.ProtocolException: Invalid or unsupported protocol version: 4 at
[jira] [Comment Edited] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement
[ https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069565#comment-16069565 ] ZhaoYang edited comment on CASSANDRA-13592 at 6/30/17 7:37 AM: --- || source || junit-result || dtest-result|| | [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | [junit|https://circleci.com/gh/jasonstack/cassandra/84] | {{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk {{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506] | | [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11] | [junit|https://circleci.com/gh/jasonstack/cassandra/82] | {{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229] {{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} [known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | | [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0] | [junit|https://circleci.com/gh/jasonstack/cassandra/83] | | | [2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2] | [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer position the same. 2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with double-quote) to be consistent with user json input 3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, otherwise parent method throws NPE. was (Author: jasonstack): || source || junit-result || dtest-result|| | [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | [junit|https://circleci.com/gh/jasonstack/cassandra/84] | | | [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11] | [junit|https://circleci.com/gh/jasonstack/cassandra/82] | | | [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0] | [junit|https://circleci.com/gh/jasonstack/cassandra/83] | | | [2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2] | [junit|https://circleci.com/gh/jasonstack/cassandra/85] | | 1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer position the same. 2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with double-quote) to be consistent with user json input 3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, otherwise parent method throws NPE. > Null Pointer exception at SELECT JSON statement > --- > > Key: CASSANDRA-13592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13592 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Debian Linux >Reporter: Wyss Philipp >Assignee: ZhaoYang > Labels: beginner > Attachments: system.log > > > A Nulll pointer exception appears when the command > {code} > SELECT JSON * FROM examples.basic; > ---MORE--- > message="java.lang.NullPointerException"> > Examples.basic has the following description (DESC examples.basic;): > CREATE TABLE examples.basic ( > key frozen> PRIMARY KEY, > wert text > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > {code} > The error appears after the ---MORE--- line. > The field "wert" has a JSON formatted string. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement
[ https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069565#comment-16069565 ] ZhaoYang commented on CASSANDRA-13592: -- || source || junit-result || dtest-result|| | [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | [junit|https://circleci.com/gh/jasonstack/cassandra/84] | | | [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11] | [junit|https://circleci.com/gh/jasonstack/cassandra/82] | | | [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0] | [junit|https://circleci.com/gh/jasonstack/cassandra/83] | | | [2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2] | [junit|https://circleci.com/gh/jasonstack/cassandra/85] | | 1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer position the same. 2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with double-quote) to be consistent with user json input 3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, otherwise parent method throws NPE. > Null Pointer exception at SELECT JSON statement > --- > > Key: CASSANDRA-13592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13592 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Debian Linux >Reporter: Wyss Philipp >Assignee: ZhaoYang > Labels: beginner > Attachments: system.log > > > A Nulll pointer exception appears when the command > {code} > SELECT JSON * FROM examples.basic; > ---MORE--- > message="java.lang.NullPointerException"> > Examples.basic has the following description (DESC examples.basic;): > CREATE TABLE examples.basic ( > key frozen> PRIMARY KEY, > wert text > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > {code} > The error appears after the ---MORE--- line. > The field "wert" has a JSON formatted string. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org