[jira] [Comment Edited] (CASSANDRA-12484) Unknown exception caught while attempting to update MaterializedView! findkita.kitas java.lang.AssertionErro

2017-06-30 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064708#comment-16064708
 ] 

ZhaoYang edited comment on CASSANDRA-12484 at 7/1/17 5:15 AM:
--

[~cordlesswool] could you share you table schemas and typical queries?   


was (Author: jasonstack):
[~cordlesswool] could you share you table schemas and typical queries?  which 
version is fixed? 

> Unknown exception caught while attempting to update MaterializedView! 
> findkita.kitas java.lang.AssertionErro
> 
>
> Key: CASSANDRA-12484
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12484
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
> Environment: Docker Container with Cassandra version 3.7 running on 
> local pc
>Reporter: cordlessWool
>Priority: Critical
>
> After restart my cassandra node does not start anymore. Ends with following 
> error message.
> ERROR 18:39:37 Unknown exception caught while attempting to update 
> MaterializedView! findkita.kitas
> java.lang.AssertionError: We shouldn't have got there is the base row had no 
> associated entry
> Cassandra has heavy cpu usage and use 2,1 gb of memory there is be 1gb more 
> available. I run nodetool cleanup and repair, but did not help.
> I have 5 materialzied views on this table, but the amount of rows in table is 
> under 2000, that is not much.
> The cassandra runs in a docker container. The container is access able, but 
> can not call cqlsh and my website cound not connect too



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-06-30 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070997#comment-16070997
 ] 

Corentin Chary commented on CASSANDRA-13651:


Also check:
* https://github.com/netty/netty/issues/1759
* https://gist.github.com/jadbaz/47d98da0ead2e71659f343b14ef05de6
* Benchmark batching vs. stupid writeAndFlush()
* It's unclear why sending the response is done in the flusher right now
* https://github.com/spotify/netty-batch-flusher

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> 

[jira] [Commented] (CASSANDRA-13645) Optimize the number of replicas required in Quorum read/write

2017-06-30 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070696#comment-16070696
 ] 

Jay Zhuang commented on CASSANDRA-13645:


Link to CASSANDRA-8119: More Expressive Consistency Levels

> Optimize the number of replicas required in Quorum read/write
> -
>
> Key: CASSANDRA-13645
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13645
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Pengchao Wang
> Fix For: 4.x
>
>
> Currently, for C* read/write requests with quorum consistent level, number of 
> replicas required for quorum write is W=N/2+1, and number of replicas 
> required for quorum read is R=N/2+1 as well. 
> It works fine in odd number of replicas case, which R + W = N + 1, but in 
> even number of replicas case, like RF=4, 6, 8, the R+W = N + 2, which means 
> we are having two overlapping nodes in read/write requests, which is not 
> necessary. It can not provide strong consistency, but will hurts P99 read 
> latency a lot (2X in our production cluster).
> In a lot of other database, like Amazon Aurora, they use W = N/2 + 1 and R = 
> N/2 for quorum requests, which will provide enough strong consistency, but 
> talk to one less replica in read path. "We use a quorum model with 6 votes (V 
> = 6), a write quorum of 4/6 (Vw = 4), and a read quorum of 3/6 (Vr = 3)."
> I propose we do the same optimization, change read quorum to talk to N/2 
> replicas, which should reduce the read latency for quorum read in general.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13645) Optimize the number of replicas required in Quorum read/write

2017-06-30 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070667#comment-16070667
 ] 

Jay Zhuang commented on CASSANDRA-13645:


and {{CL.EACH_HALF}}?

> Optimize the number of replicas required in Quorum read/write
> -
>
> Key: CASSANDRA-13645
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13645
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Pengchao Wang
> Fix For: 4.x
>
>
> Currently, for C* read/write requests with quorum consistent level, number of 
> replicas required for quorum write is W=N/2+1, and number of replicas 
> required for quorum read is R=N/2+1 as well. 
> It works fine in odd number of replicas case, which R + W = N + 1, but in 
> even number of replicas case, like RF=4, 6, 8, the R+W = N + 2, which means 
> we are having two overlapping nodes in read/write requests, which is not 
> necessary. It can not provide strong consistency, but will hurts P99 read 
> latency a lot (2X in our production cluster).
> In a lot of other database, like Amazon Aurora, they use W = N/2 + 1 and R = 
> N/2 for quorum requests, which will provide enough strong consistency, but 
> talk to one less replica in read path. "We use a quorum model with 6 votes (V 
> = 6), a write quorum of 4/6 (Vw = 4), and a read quorum of 3/6 (Vr = 3)."
> I propose we do the same optimization, change read quorum to talk to N/2 
> replicas, which should reduce the read latency for quorum read in general.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-06-30 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070634#comment-16070634
 ] 

Jason Brown commented on CASSANDRA-13651:
-

/cc [~norman]

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but 
> netty+epoll only support timeouts above the milliseconds and will convert 
> everything bellow to 0.
> I added some traces to netty (4.1):
> {code}
> diff --git 
> 

[jira] [Issue Comment Deleted] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-06-30 Thread Jason Brown (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-13649:

Comment: was deleted

(was: It's a netty common practice to include an exception handler at the end 
of a netty pipeline to handle cases like this. However, I'm reticent to add yet 
another handler to the pipeline as some of my testing for CASSANDRA-8457 
(admittedly, very early-stage testing) showed that we spend extra time in the 
pipeline just by all the mechanics around invoking another handler (checking 
the promise, state of the channel, and so on).

That being said, I can probably find some time to reinvestigate as part of 
finalizing all the netty-related things for 4.0. [~spo...@gmail.com] feel free 
to assign to me if you like, but I probably can't get to it for about a month.)

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
> Attachments: test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> This one looks also odd and makes 
> upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test fail:
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-29 02:41:37,125 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.handler.codec.DecoderException: 
> org.apache.cassandra.transport.ProtocolException: Invalid or unsupported 
> protocol version: 4
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:375)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:342)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> 

[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-06-30 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070620#comment-16070620
 ] 

Jason Brown commented on CASSANDRA-13649:
-

It's a netty common practice to include an exception handler at the end of a 
netty pipeline to handle cases like this. However, I'm reticent to add yet 
another handler to the pipeline as some of my testing for CASSANDRA-8457 
(admittedly, very early-stage testing) showed that we spend extra time in the 
pipeline just by all the mechanics around invoking another handler (checking 
the promise, state of the channel, and so on).

That being said, I can probably find some time to reinvestigate as part of 
finalizing all the netty-related things for 4.0. [~spo...@gmail.com] feel free 
to assign to me if you like, but I probably can't get to it for about a month.

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
> Attachments: test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> This one looks also odd and makes 
> upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test fail:
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-29 02:41:37,125 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.handler.codec.DecoderException: 
> org.apache.cassandra.transport.ProtocolException: Invalid or unsupported 
> protocol version: 4
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:375)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:342)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> 

[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-06-30 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070621#comment-16070621
 ] 

Jason Brown commented on CASSANDRA-13649:
-

It's a netty common practice to include an exception handler at the end of a 
netty pipeline to handle cases like this. However, I'm reticent to add yet 
another handler to the pipeline as some of my testing for CASSANDRA-8457 
(admittedly, very early-stage testing) showed that we spend extra time in the 
pipeline just by all the mechanics around invoking another handler (checking 
the promise, state of the channel, and so on).

That being said, I can probably find some time to reinvestigate as part of 
finalizing all the netty-related things for 4.0. [~spo...@gmail.com] feel free 
to assign to me if you like, but I probably can't get to it for about a month.

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefan Podkowinski
> Attachments: test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> This one looks also odd and makes 
> upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test fail:
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-29 02:41:37,125 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.handler.codec.DecoderException: 
> org.apache.cassandra.transport.ProtocolException: Invalid or unsupported 
> protocol version: 4
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:375)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:342)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>   at 
> 

[jira] [Commented] (CASSANDRA-13645) Optimize the number of replicas required in Quorum read/write

2017-06-30 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070608#comment-16070608
 ] 

Jason Brown commented on CASSANDRA-13645:
-

To be clear, though, a user will have to know that they must use different CLs 
in order to gain the optimization as proposed by this ticket. Meaning, you 
write at {{CL.QUORUM}} and read at {{CL.HALF}}; you can't write and read at 
{{CL.HALF}} and get strong consistency properties.

As much as I don't want to open another can of worms, but do we need a 
corresponding {{CL.LOCAL_HALF}}, as well?

> Optimize the number of replicas required in Quorum read/write
> -
>
> Key: CASSANDRA-13645
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13645
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Dikang Gu
>Assignee: Pengchao Wang
> Fix For: 4.x
>
>
> Currently, for C* read/write requests with quorum consistent level, number of 
> replicas required for quorum write is W=N/2+1, and number of replicas 
> required for quorum read is R=N/2+1 as well. 
> It works fine in odd number of replicas case, which R + W = N + 1, but in 
> even number of replicas case, like RF=4, 6, 8, the R+W = N + 2, which means 
> we are having two overlapping nodes in read/write requests, which is not 
> necessary. It can not provide strong consistency, but will hurts P99 read 
> latency a lot (2X in our production cluster).
> In a lot of other database, like Amazon Aurora, they use W = N/2 + 1 and R = 
> N/2 for quorum requests, which will provide enough strong consistency, but 
> talk to one less replica in read path. "We use a quorum model with 6 votes (V 
> = 6), a write quorum of 4/6 (Vw = 4), and a read quorum of 3/6 (Vr = 3)."
> I propose we do the same optimization, change read quorum to talk to N/2 
> replicas, which should reduce the read latency for quorum read in general.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-06-30 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070535#comment-16070535
 ] 

Corentin Chary commented on CASSANDRA-13651:


Things to check or try (for me):
* io.netty.eventLoopThreads
* Check if we could use the same eventloop instead of starting two
* Create a custom SelectStrategy that skips looking at fds if there is a 
scheduled task happening in a few microseconds
* Try to understand why Message::Flusher currently works this way

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 

[jira] [Updated] (CASSANDRA-10446) Run repair with down replicas

2017-06-30 Thread Blake Eggleston (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-10446:

Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

Committed as {{45c0f860f3c7f8e0a7c80809c4ff47f4acf65557}}

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



cassandra git commit: Run repair with down replicas

2017-06-30 Thread bdeggleston
Repository: cassandra
Updated Branches:
  refs/heads/trunk 176f2a444 -> 45c0f860f


Run repair with down replicas

Patch by Sankalp Kohli & Blake Eggleston; Reviewed by Marcus Eriksson for 
CASSANDRA-10446


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/45c0f860
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/45c0f860
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/45c0f860

Branch: refs/heads/trunk
Commit: 45c0f860f3c7f8e0a7c80809c4ff47f4acf65557
Parents: 176f2a4
Author: Blake Eggleston 
Authored: Wed Oct 12 10:14:16 2016 -0700
Committer: Blake Eggleston 
Committed: Fri Jun 30 11:31:15 2017 -0700

--
 CHANGES.txt |  2 +
 .../apache/cassandra/repair/RepairRunnable.java | 12 +-
 .../apache/cassandra/repair/RepairSession.java  | 39 ++--
 .../cassandra/repair/RepairSessionResult.java   | 15 +++-
 .../cassandra/repair/messages/RepairOption.java | 25 -
 .../cassandra/service/ActiveRepairService.java  | 15 ++--
 .../apache/cassandra/tools/nodetool/Repair.java |  4 ++
 .../cassandra/repair/RepairSessionTest.java |  2 +-
 .../consistent/CoordinatorSessionTest.java  |  2 +-
 .../repair/messages/RepairOptionTest.java   | 22 +++
 10 files changed, 125 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/45c0f860/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 866c6fd..6444994 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,6 @@
 4.0
+ * Run repair with down replicas (CASSANDRA-10446)
+ * Added started & completed repair metrics (CASSANDRA-13598)
  * Added started & completed repair metrics (CASSANDRA-13598)
  * Improve secondary index (re)build failure and concurrency handling 
(CASSANDRA-10130)
  * Improve calculation of available disk space for compaction (CASSANDRA-13068)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/45c0f860/src/java/org/apache/cassandra/repair/RepairRunnable.java
--
diff --git a/src/java/org/apache/cassandra/repair/RepairRunnable.java 
b/src/java/org/apache/cassandra/repair/RepairRunnable.java
index eca162e..29347a4 100644
--- a/src/java/org/apache/cassandra/repair/RepairRunnable.java
+++ b/src/java/org/apache/cassandra/repair/RepairRunnable.java
@@ -289,9 +289,18 @@ public class RepairRunnable extends WrappedRunnable 
implements ProgressEventNoti
 // filter out null(=failed) results and get successful ranges
 for (RepairSessionResult sessionResult : results)
 {
+logger.debug("Repair result: {}", results);
 if (sessionResult != null)
 {
-successfulRanges.addAll(sessionResult.ranges);
+// don't promote sstables for sessions we skipped 
replicas for
+if (!sessionResult.skippedReplicas)
+{
+successfulRanges.addAll(sessionResult.ranges);
+}
+else
+{
+logger.debug("Skipping anticompaction for {}", 
results);
+}
 }
 else
 {
@@ -424,6 +433,7 @@ public class RepairRunnable extends WrappedRunnable 
implements ProgressEventNoti

  p.left,

  isConsistent,

  options.isPullRepair(),
+   
  options.isForcedRepair(),

  options.getPreviewKind(),

  executor,

  cfnames);

http://git-wip-us.apache.org/repos/asf/cassandra/blob/45c0f860/src/java/org/apache/cassandra/repair/RepairSession.java
--
diff --git a/src/java/org/apache/cassandra/repair/RepairSession.java 
b/src/java/org/apache/cassandra/repair/RepairSession.java
index c1b3f41..98ed1a3 100644
--- a/src/java/org/apache/cassandra/repair/RepairSession.java
+++ b/src/java/org/apache/cassandra/repair/RepairSession.java
@@ -36,6 +36,7 @@ 

[jira] [Updated] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement

2017-06-30 Thread ZhaoYang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-13592:
-
Status: Patch Available  (was: In Progress)

> Null Pointer exception at SELECT JSON statement
> ---
>
> Key: CASSANDRA-13592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13592
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Debian Linux
>Reporter: Wyss Philipp
>Assignee: ZhaoYang
>  Labels: beginner
> Attachments: system.log
>
>
> A Nulll pointer exception appears when the command
> {code}
> SELECT JSON * FROM examples.basic;
> ---MORE---
>  message="java.lang.NullPointerException">
> Examples.basic has the following description (DESC examples.basic;):
> CREATE TABLE examples.basic (
> key frozen> PRIMARY KEY,
> wert text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> {code}
> The error appears after the ---MORE--- line.
> The field "wert" has a JSON formatted string.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-06-30 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13651:
---
Description: 
I was trying to profile Cassandra under my workload and I kept seeing this 
backtrace:
{code}
epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
(native)
io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
Native.java:111
io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) EpollEventLoop.java:230
io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
SingleThreadEventExecutor.java:858
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
DefaultThreadFactory.java:138
java.lang.Thread.run() Thread.java:745
{code}

At fist I though that the profiler might not be able to profile native code 
properly, but I wen't further and I realized that most of the CPU was used by 
{{epoll_wait()}} calls with a timeout of zero.

Here is the output of perf on this system, which confirms that most of the 
overhead was with timeout == 0.

{code}
Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
11594448
Overhead  Trace output  

 ◆
  90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
timeout: 0x 
  ▒
   5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
timeout: 0x 
  ▒
   1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
timeout: 0x03e8 
  ▒
   0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
timeout: 0x 
  ▒
   0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
timeout: 0x 
  ▒
   0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
timeout: 0x 
  ▒
   0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
timeout: 0x
{code}

Running this time with perf record -ag for call traces:
{code}
# Children  Self   sys   usr  Trace output  
  
#         

#
 8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
0x7fca452d6000, maxevents: 0x1000, timeout: 0x
|
---0x1000200af313
   |  
--8.61%--0x7fca6117bdac
  0x7fca60459804
  epoll_wait

 2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
|
---0x1000200af313
   0x7fca6117b830
   0x7fca60459804
   epoll_wait
{code}

That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
reports a per-CPU percentage or a per-system percentage, but that would be 
still be 10% of the total CPU usage of Cassandra at the minimum.

I went further and found the code of all that: We schedule a lot of 
{{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but 
netty+epoll only support timeouts above the milliseconds and will convert 
everything bellow to 0.

I added some traces to netty (4.1):
{code}
diff --git 
a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
 
b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
index 909088fde..8734bbfd4 100644
--- 
a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
+++ 
b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
@@ -208,10 +208,15 @@ final class EpollEventLoop extends SingleThreadEventLoop {
 long currentTimeNanos = System.nanoTime();
 long selectDeadLineNanos = currentTimeNanos + 
delayNanos(currentTimeNanos);
 for (;;) 

[jira] [Created] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-06-30 Thread Corentin Chary (JIRA)
Corentin Chary created CASSANDRA-13651:
--

 Summary: Large amount of CPU used by epoll_wait(.., .., .., 0)
 Key: CASSANDRA-13651
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
 Project: Cassandra
  Issue Type: Bug
Reporter: Corentin Chary
 Fix For: 4.x


I was trying to profile Cassandra under my workload and I kept seeing this 
backtrace:
{code}
epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
(native)
io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
Native.java:111
io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) EpollEventLoop.java:230
io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
SingleThreadEventExecutor.java:858
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
DefaultThreadFactory.java:138
java.lang.Thread.run() Thread.java:745
{code}

At fist I though that the profiler might not be able to profile native code 
properly, but I wen't further and I realized that most of the CPU was used by 
epoll_wait() calls with a timeout of zero.

Here is the output of perf on this system, which confirms that most of the 
overhead was with timeout == 0.

{code}
Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
11594448
Overhead  Trace output  

 ◆
  90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
timeout: 0x 
  ▒
   5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
timeout: 0x 
  ▒
   1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
timeout: 0x03e8 
  ▒
   0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
timeout: 0x 
  ▒
   0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
timeout: 0x 
  ▒
   0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
timeout: 0x 
  ▒
   0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
timeout: 0x
{code}

Running this time with perf record -ag for call traces:
{code}
# Children  Self   sys   usr  Trace output  
  
#         

#
 8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
0x7fca452d6000, maxevents: 0x1000, timeout: 0x
|
---0x1000200af313
   |  
--8.61%--0x7fca6117bdac
  0x7fca60459804
  epoll_wait

 2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
|
---0x1000200af313
   0x7fca6117b830
   0x7fca60459804
   epoll_wait
{code}

That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
reports a per-CPU percentage or a per-system percentage, but that would be 
still be 10% of the total CPU usage of Cassandra at the minimum.

I went further and found the code of all that: We schedule a lot of 
Message::Flusher with a deadline of 10 usec (5 per messages I think) but 
netty+epoll only support timeouts above the milliseconds and will convert 
everything bellow to 0.

I added some traces to netty (4.1):
{code}
diff --git 
a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
 
b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
index 909088fde..8734bbfd4 100644
--- 
a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
+++ 
b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
@@ -208,10 +208,15 @@ final class EpollEventLoop extends SingleThreadEventLoop 

[jira] [Updated] (CASSANDRA-13650) cql_tests:SlowQueryTester.local_query_test and cql_tests:SlowQueryTester.remote_query_test failed on trunk

2017-06-30 Thread ZhaoYang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-13650:
-
Description: 
cql_tests.py:SlowQueryTester.local_query_test failed on trunk
cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217


It's due to the dtest unable to find {{'SELECT \* FROM ks.test1'}} pattern from 
log.
but in the log, following info is showed: 
{{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: 
, time 102 msec - slow timeout 10 msec}}

ColumnFilter.toString() should return {{*}}, but return normal column {{val}} 
instead 

  was:
cql_tests.py:SlowQueryTester.local_query_test failed on trunk
cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217


The cause seems due to the dtest unable to find {{"SELECT \* FROM ks.test1"}} 
pattern from log.
but in the log, following info is showed: 
{{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: 
, time 102 msec - slow timeout 10 msec}}

ColumnFilter.toString() should return {{*}}, but return normal column {{val}} 
instead 


> cql_tests:SlowQueryTester.local_query_test and 
> cql_tests:SlowQueryTester.remote_query_test failed on trunk
> --
>
> Key: CASSANDRA-13650
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13650
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: ZhaoYang
>Assignee: ZhaoYang
> Fix For: 4.x
>
>
> cql_tests.py:SlowQueryTester.local_query_test failed on trunk
> cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
> SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217
> It's due to the dtest unable to find {{'SELECT \* FROM ks.test1'}} pattern 
> from log.
> but in the log, following info is showed: 
> {{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: 
> , time 102 msec - slow timeout 10 msec}}
> ColumnFilter.toString() should return {{*}}, but return normal column {{val}} 
> instead 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13650) cql_tests:SlowQueryTester.local_query_test and cql_tests:SlowQueryTester.remote_query_test failed on trunk

2017-06-30 Thread ZhaoYang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-13650:
-
Summary: cql_tests:SlowQueryTester.local_query_test and 
cql_tests:SlowQueryTester.remote_query_test failed on trunk  (was: 
cql_tests.py:SlowQueryTester.local_query_test and 
cql_tests.py:SlowQueryTester.remote_query_test failed on trunk)

> cql_tests:SlowQueryTester.local_query_test and 
> cql_tests:SlowQueryTester.remote_query_test failed on trunk
> --
>
> Key: CASSANDRA-13650
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13650
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: ZhaoYang
>Assignee: ZhaoYang
> Fix For: 4.x
>
>
> cql_tests.py:SlowQueryTester.local_query_test failed on trunk
> cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
> SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217
> The cause seems due to the dtest unable to find {{"SELECT \* FROM ks.test1"}} 
> pattern from log.
> but in the log, following info is showed: 
> {{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: 
> , time 102 msec - slow timeout 10 msec}}
> ColumnFilter.toString() should return {{*}}, but return normal column {{val}} 
> instead 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement

2017-06-30 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069843#comment-16069843
 ] 

ZhaoYang commented on CASSANDRA-13592:
--

I have created [ticket|https://issues.apache.org/jira/browse/CASSANDRA-13650] 
for {{cql_tests.py:SlowQueryTester.local_query_test}} & 
{{cql_tests.py:SlowQueryTester.remote_query_test}}

> Null Pointer exception at SELECT JSON statement
> ---
>
> Key: CASSANDRA-13592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13592
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Debian Linux
>Reporter: Wyss Philipp
>Assignee: ZhaoYang
>  Labels: beginner
> Attachments: system.log
>
>
> A Nulll pointer exception appears when the command
> {code}
> SELECT JSON * FROM examples.basic;
> ---MORE---
>  message="java.lang.NullPointerException">
> Examples.basic has the following description (DESC examples.basic;):
> CREATE TABLE examples.basic (
> key frozen> PRIMARY KEY,
> wert text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> {code}
> The error appears after the ---MORE--- line.
> The field "wert" has a JSON formatted string.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13650) cql_tests.py:SlowQueryTester.local_query_test and cql_tests.py:SlowQueryTester.remote_query_test failed on trunk

2017-06-30 Thread ZhaoYang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-13650:
-
Description: 
cql_tests.py:SlowQueryTester.local_query_test failed on trunk
cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217


The cause seems due to the dtest unable to find {{"SELECT \* FROM ks.test1"}} 
pattern from log.
but in the log, following info is showed: 
{{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: 
, time 102 msec - slow timeout 10 msec}}

ColumnFilter.toString() should return {{*}}, but return normal column {{val}} 
instead 

  was:
cql_tests.py:SlowQueryTester.local_query_test failed on trunk
cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217


The cause seems due to the dtest unable to find "SELECT \* FROM ks.test1" 
pattern from log.
but in the log, following info is showed: 
{{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: 
, time 102 msec - slow timeout 10 msec}}

ColumnFilter.toString() should return "*", but return normal column "val" 
instead 


> cql_tests.py:SlowQueryTester.local_query_test and 
> cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
> 
>
> Key: CASSANDRA-13650
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13650
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: ZhaoYang
>Assignee: ZhaoYang
> Fix For: 4.x
>
>
> cql_tests.py:SlowQueryTester.local_query_test failed on trunk
> cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
> SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217
> The cause seems due to the dtest unable to find {{"SELECT \* FROM ks.test1"}} 
> pattern from log.
> but in the log, following info is showed: 
> {{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: 
> , time 102 msec - slow timeout 10 msec}}
> ColumnFilter.toString() should return {{*}}, but return normal column {{val}} 
> instead 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13650) cql_tests.py:SlowQueryTester.local_query_test and cql_tests.py:SlowQueryTester.remote_query_test failed on trunk

2017-06-30 Thread ZhaoYang (JIRA)
ZhaoYang created CASSANDRA-13650:


 Summary: cql_tests.py:SlowQueryTester.local_query_test and 
cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
 Key: CASSANDRA-13650
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13650
 Project: Cassandra
  Issue Type: Bug
  Components: Distributed Metadata
Reporter: ZhaoYang
Assignee: ZhaoYang
 Fix For: 4.x


cql_tests.py:SlowQueryTester.local_query_test failed on trunk
cql_tests.py:SlowQueryTester.remote_query_test failed on trunk
SHA: fe3cfe3d7df296f022c50c9c0d22f91a0fc0a217


The cause seems due to the dtest unable to find "SELECT \* FROM ks.test1" 
pattern from log.
but in the log, following info is showed: 
{{MonitoringTask.java:173 - 1 operations were slow in the last 10 msecs: 
, time 102 msec - slow timeout 10 msec}}

ColumnFilter.toString() should return "*", but return normal column "val" 
instead 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement

2017-06-30 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069565#comment-16069565
 ] 

ZhaoYang edited comment on CASSANDRA-13592 at 6/30/17 9:47 AM:
---

|| source || junit-result || dtest-result||
| [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | 
[junit|https://circleci.com/gh/jasonstack/cassandra/84]  | 
{{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk
{{cql_tests.py:SlowQueryTester.remote_query_test}} failed on trunk
{{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506]
| 
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/82] | 
{{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229]
{{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} 
[known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | 
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/83] | 
{{auth_test.TestAuth.system_auth_ks_is_alterable_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13113]
{{offline_tools_test.TestOfflineTools.sstableofflinerelevel_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-12617]
{{repair_tests.incremental_repair_test.TestIncRepair.multiple_repair_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13515]|
 
| 
[2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 

1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer 
position the same.
2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with 
double-quote) to be consistent with user json input
3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, 
otherwise parent method throws NPE.


was (Author: jasonstack):
|| source || junit-result || dtest-result||
| [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | 
[junit|https://circleci.com/gh/jasonstack/cassandra/84]  | 
{{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk
{{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506]
| 
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/82] | 
{{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229]
{{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} 
[known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | 
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/83] | 
{{auth_test.TestAuth.system_auth_ks_is_alterable_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13113]
{{offline_tools_test.TestOfflineTools.sstableofflinerelevel_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-12617]
{{repair_tests.incremental_repair_test.TestIncRepair.multiple_repair_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13515]|
 
| 
[2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 

1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer 
position the same.
2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with 
double-quote) to be consistent with user json input
3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, 
otherwise parent method throws NPE.

> Null Pointer exception at SELECT JSON statement
> ---
>
> Key: CASSANDRA-13592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13592
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Debian Linux
>Reporter: Wyss Philipp
>Assignee: ZhaoYang
>  Labels: beginner
> Attachments: system.log
>
>
> A Nulll pointer exception appears when the command
> {code}
> SELECT JSON * FROM examples.basic;
> ---MORE---
>  message="java.lang.NullPointerException">
> Examples.basic has the following description (DESC examples.basic;):
> CREATE TABLE examples.basic (
> key frozen> PRIMARY KEY,
> wert text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 

[jira] [Updated] (CASSANDRA-13598) Started & Completed repair metrics

2017-06-30 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-13598:
---
Attachment: 13598-3.11.patch
13598-3.0.patch

> Started & Completed repair metrics
> --
>
> Key: CASSANDRA-13598
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13598
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Assignee: Cameron Zemek
>Priority: Minor
>  Labels: repair
> Fix For: 4.0
>
> Attachments: 13598-3.0.patch, 13598-3.11.patch
>
>
> There are no metrics to monitor repairs run as co-ordinator. A number of 
> metrics were added with CASSANDRA-13531 but didn't include metrics to monitor 
> if repair is running or how many repairs have ran.
> |4.x|[patch|https://github.com/apache/cassandra/compare/instaclustr:trunk...instaclustr:13598-4.x]|
> |3.11|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-3.11...instaclustr:13598-3.11]|
> |3.0|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-3.0...instaclustr:13598-3.0]|
> |2.2|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-2.2...instaclustr:13598-2.2]|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13598) Started & Completed repair metrics

2017-06-30 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-13598:
---
   Resolution: Fixed
Fix Version/s: (was: 4.x)
   4.0
   Status: Resolved  (was: Patch Available)

Failed tests look unrelated. I've also quickly tested locally and realized that 
we really should name these RepairJobsStarted/Completed instead of just 
RepairsStarted/Completed, as the latest implies that something has been 
actually repaired, which doesn't have to be the case. Also 
RepairJobsStarted/Completed may make it more obvious that this is only on the 
coordinator. 

I've commited this to 4.0 as 176f2a444cd, since this really doesn't qualify as 
a bug fix. I've attached patches for backports.

Thanks for the contribution, Cameron!


> Started & Completed repair metrics
> --
>
> Key: CASSANDRA-13598
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13598
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Cameron Zemek
>Assignee: Cameron Zemek
>Priority: Minor
>  Labels: repair
> Fix For: 4.0
>
>
> There are no metrics to monitor repairs run as co-ordinator. A number of 
> metrics were added with CASSANDRA-13531 but didn't include metrics to monitor 
> if repair is running or how many repairs have ran.
> |4.x|[patch|https://github.com/apache/cassandra/compare/instaclustr:trunk...instaclustr:13598-4.x]|
> |3.11|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-3.11...instaclustr:13598-3.11]|
> |3.0|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-3.0...instaclustr:13598-3.0]|
> |2.2|[patch|https://github.com/instaclustr/cassandra/compare/cassandra-2.2...instaclustr:13598-2.2]|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



cassandra git commit: Add started & completed repair metrics

2017-06-30 Thread spod
Repository: cassandra
Updated Branches:
  refs/heads/trunk fe3cfe3d7 -> 176f2a444


Add started & completed repair metrics

patch by Cameron Zemek; reviewed by Stefan Podkowinski for CASSANDRA-13598


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/176f2a44
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/176f2a44
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/176f2a44

Branch: refs/heads/trunk
Commit: 176f2a444cd2a6ed7c3be6ac126b6ca2c4f255cf
Parents: fe3cfe3
Author: Cameron Zemek 
Authored: Wed Jun 14 14:06:53 2017 +1000
Committer: Stefan Podkowinski 
Committed: Fri Jun 30 11:28:13 2017 +0200

--
 CHANGES.txt   |  1 +
 .../apache/cassandra/metrics/KeyspaceMetrics.java | 18 ++
 .../apache/cassandra/metrics/TableMetrics.java|  7 +++
 .../org/apache/cassandra/repair/RepairJob.java|  6 ++
 4 files changed, 32 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/176f2a44/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index e56eb78..866c6fd 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0
+ * Added started & completed repair metrics (CASSANDRA-13598)
  * Improve secondary index (re)build failure and concurrency handling 
(CASSANDRA-10130)
  * Improve calculation of available disk space for compaction (CASSANDRA-13068)
  * Change the accessibility of RowCacheSerializer for third party row cache 
plugins (CASSANDRA-13579)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/176f2a44/src/java/org/apache/cassandra/metrics/KeyspaceMetrics.java
--
diff --git a/src/java/org/apache/cassandra/metrics/KeyspaceMetrics.java 
b/src/java/org/apache/cassandra/metrics/KeyspaceMetrics.java
index affb372..9e8d542 100644
--- a/src/java/org/apache/cassandra/metrics/KeyspaceMetrics.java
+++ b/src/java/org/apache/cassandra/metrics/KeyspaceMetrics.java
@@ -102,6 +102,10 @@ public class KeyspaceMetrics
 public final Counter speculativeFailedRetries;
 /** Needed to speculate, but didn't have enough replicas **/
 public final Counter speculativeInsufficientReplicas;
+/** Number of started repairs as coordinator on this keyspace */
+public final Counter repairsStarted;
+/** Number of completed repairs as coordinator on this keyspace */
+public final Counter repairsCompleted;
 /** total time spent as a repair coordinator */
 public final Timer repairTime;
 /** total time spent preparing for repair */
@@ -285,6 +289,20 @@ public class KeyspaceMetrics
 return metric.speculativeInsufficientReplicas.getCount();
 }
 });
+repairsStarted = createKeyspaceCounter("RepairJobsStarted", new 
MetricValue()
+{
+public Long getValue(TableMetrics metric)
+{
+return metric.repairsStarted.getCount();
+}
+});
+repairsCompleted = createKeyspaceCounter("RepairJobsCompleted", new 
MetricValue()
+{
+public Long getValue(TableMetrics metric)
+{
+return metric.repairsCompleted.getCount();
+}
+});
 repairTime = Metrics.timer(factory.createMetricName("RepairTime"));
 repairPrepareTime = 
Metrics.timer(factory.createMetricName("RepairPrepareTime"));
 anticompactionTime = 
Metrics.timer(factory.createMetricName("AntiCompactionTime"));

http://git-wip-us.apache.org/repos/asf/cassandra/blob/176f2a44/src/java/org/apache/cassandra/metrics/TableMetrics.java
--
diff --git a/src/java/org/apache/cassandra/metrics/TableMetrics.java 
b/src/java/org/apache/cassandra/metrics/TableMetrics.java
index 40a927f..98fd1e9 100644
--- a/src/java/org/apache/cassandra/metrics/TableMetrics.java
+++ b/src/java/org/apache/cassandra/metrics/TableMetrics.java
@@ -146,6 +146,10 @@ public class TableMetrics
 public final LatencyMetrics casCommit;
 /** percent of the data that is repaired */
 public final Gauge percentRepaired;
+/** Number of started repairs as coordinator on this table */
+public final Counter repairsStarted;
+/** Number of completed repairs as coordinator on this table */
+public final Counter repairsCompleted;
 /** time spent anticompacting data before participating in a consistent 
repair */
 public final TableTimer anticompactionTime;
 /** time spent creating merkle trees */
@@ -723,6 +727,9 @@ public class TableMetrics
 casPropose = new LatencyMetrics(factory, "CasPropose", 

[jira] [Comment Edited] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement

2017-06-30 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069565#comment-16069565
 ] 

ZhaoYang edited comment on CASSANDRA-13592 at 6/30/17 9:23 AM:
---

|| source || junit-result || dtest-result||
| [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | 
[junit|https://circleci.com/gh/jasonstack/cassandra/84]  | 
{{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk
{{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506]
| 
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/82] | 
{{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229]
{{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} 
[known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | 
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/83] | 
{{auth_test.TestAuth.system_auth_ks_is_alterable_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13113]
{{offline_tools_test.TestOfflineTools.sstableofflinerelevel_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-12617]
{{repair_tests.incremental_repair_test.TestIncRepair.multiple_repair_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13515]|
 
| 
[2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 

1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer 
position the same.
2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with 
double-quote) to be consistent with user json input
3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, 
otherwise parent method throws NPE.


was (Author: jasonstack):
|| source || junit-result || dtest-result||
| [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | 
[junit|https://circleci.com/gh/jasonstack/cassandra/84]  | 
{{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk
{{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506]
| 
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/82] | 
{{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229]
{{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} 
[known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | 
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/83] | 
{{auth_test.TestAuth.system_auth_ks_is_alterable_test}}
 
[known|https://issues.apache.org/jira/browse/CASSANDRA-13113]{{offline_tools_test.TestOfflineTools.sstableofflinerelevel_test}}
 [known|https://issues.apache.org/jira/browse/CASSANDRA-12617]
 {{repair_tests.incremental_repair_test.TestIncRepair.multiple_repair_test }} | 
[known|https://issues.apache.org/jira/browse/CASSANDRA-13515]| 
| 
[2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 

1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer 
position the same.
2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with 
double-quote) to be consistent with user json input
3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, 
otherwise parent method throws NPE.

> Null Pointer exception at SELECT JSON statement
> ---
>
> Key: CASSANDRA-13592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13592
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Debian Linux
>Reporter: Wyss Philipp
>Assignee: ZhaoYang
>  Labels: beginner
> Attachments: system.log
>
>
> A Nulll pointer exception appears when the command
> {code}
> SELECT JSON * FROM examples.basic;
> ---MORE---
>  message="java.lang.NullPointerException">
> Examples.basic has the following description (DESC examples.basic;):
> CREATE TABLE examples.basic (
> key frozen> PRIMARY KEY,
> wert text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}

[jira] [Comment Edited] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement

2017-06-30 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069565#comment-16069565
 ] 

ZhaoYang edited comment on CASSANDRA-13592 at 6/30/17 9:20 AM:
---

|| source || junit-result || dtest-result||
| [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | 
[junit|https://circleci.com/gh/jasonstack/cassandra/84]  | 
{{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk
{{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506]
| 
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/82] | 
{{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229]
{{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} 
[known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | 
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/83] | 
{{auth_test.TestAuth.system_auth_ks_is_alterable_test}}
 
[known|https://issues.apache.org/jira/browse/CASSANDRA-13113]{{offline_tools_test.TestOfflineTools.sstableofflinerelevel_test}}
 [known|https://issues.apache.org/jira/browse/CASSANDRA-12617]
 {{repair_tests.incremental_repair_test.TestIncRepair.multiple_repair_test }} | 
[known|https://issues.apache.org/jira/browse/CASSANDRA-13515]| 
| 
[2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 

1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer 
position the same.
2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with 
double-quote) to be consistent with user json input
3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, 
otherwise parent method throws NPE.


was (Author: jasonstack):
|| source || junit-result || dtest-result||
| [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | 
[junit|https://circleci.com/gh/jasonstack/cassandra/84]  | 
{{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk
{{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506]
| 
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/82] | 
{{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229]
{{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} 
[known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | 
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/83] | | 
| 
[2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 

1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer 
position the same.
2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with 
double-quote) to be consistent with user json input
3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, 
otherwise parent method throws NPE.

> Null Pointer exception at SELECT JSON statement
> ---
>
> Key: CASSANDRA-13592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13592
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Debian Linux
>Reporter: Wyss Philipp
>Assignee: ZhaoYang
>  Labels: beginner
> Attachments: system.log
>
>
> A Nulll pointer exception appears when the command
> {code}
> SELECT JSON * FROM examples.basic;
> ---MORE---
>  message="java.lang.NullPointerException">
> Examples.basic has the following description (DESC examples.basic;):
> CREATE TABLE examples.basic (
> key frozen> PRIMARY KEY,
> wert text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> 

[jira] [Assigned] (CASSANDRA-13162) Batchlog replay is throttled during bootstrap, creating conditions for incorrect query results on materialized views

2017-06-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrés de la Peña reassigned CASSANDRA-13162:
-

Assignee: Andrés de la Peña

> Batchlog replay is throttled during bootstrap, creating conditions for 
> incorrect query results on materialized views
> 
>
> Key: CASSANDRA-13162
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13162
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Wei Deng
>Assignee: Andrés de la Peña
>Priority: Critical
>  Labels: bootstrap, materializedviews
>
> I've tested this in a C* 3.0 cluster with a couple of Materialized Views 
> defined (one base table and two MVs on that base table). The data volume is 
> not very high per node (about 80GB of data per node total, and that 
> particular base table has about 25GB of data uncompressed with one MV taking 
> 18GB compressed and the other MV taking 3GB), and the cluster is using decent 
> hardware (EC2 C4.8XL with 18 cores + 60GB RAM + 18K IOPS RAID0 from two 3TB 
> gp2 EBS volumes). 
> This is originally a 9-node cluster. It appears that after adding 3 more 
> nodes to the DC, the system.batches table accumulated a lot of data on the 3 
> new nodes (each having around 20GB under system.batches directory), and in 
> the subsequent week the batchlog on the 3 new nodes got slowly replayed back 
> to the rest of the nodes in the cluster. The bottleneck seems to be the 
> throttling defined in this cassandra.yaml setting: 
> batchlog_replay_throttle_in_kb, which by default is set to 1MB/s.
> Given that it is taking almost a week (and still hasn't finished) for the 
> batchlog (from MV) to be replayed after the boostrap finishes, it seems only 
> reasonable to unthrottle (or at least give it a much higher throttle rate) 
> during the initial bootstrap, and hence I'd consider this a bug for our 
> current MV implementation.
> Also as far as I understand, the bootstrap logic won't wait for the 
> backlogged batchlog to be fully replayed before changing the new 
> bootstrapping node to "UN" state, and if batchlog for the MVs got stuck in 
> this state for a long time, we basically will get wrong answers on the MVs 
> during that whole duration (until batchlog is fully played to the cluster), 
> which adds even more criticality to this bug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13629) Wait for batchlog replay during bootstrap

2017-06-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069690#comment-16069690
 ] 

Andrés de la Peña commented on CASSANDRA-13629:
---

[CASSANDRA-13065|https://issues.apache.org/jira/browse/CASSANDRA-13065], which 
was considered an improvement, solves this problem only for 4.0. If now we see 
it as a bug fix we might want to port it back to other branches. 
[~pauloricardomg], what do you think?

> Wait for batchlog replay during bootstrap
> -
>
> Key: CASSANDRA-13629
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13629
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Materialized Views
>Reporter: Andrés de la Peña
>Assignee: Andrés de la Peña
> Fix For: 4.0
>
>
> As part of the problem described in 
> [CASSANDRA-13162|https://issues.apache.org/jira/browse/CASSANDRA-13162], the 
> bootstrap logic won't wait for the backlogged batchlog to be fully replayed 
> before changing the new bootstrapping node to "UN" state. We should wait for 
> batchlog replay before making the node available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-06-30 Thread Stefan Podkowinski (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Podkowinski updated CASSANDRA-13649:
---
Description: 
I've noticed some netty related errors in trunk in [some of the dtest 
results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
 Just want to make sure that we don't have to change anything related to the 
exception handling in our pipeline and that this isn't a netty issue. Actually 
if this causes flakiness but is otherwise harmless, we should do something 
about it, even if it's just on the dtest side.


{noformat}
WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 - 
An exceptionCaught() event was fired, and it reached at the tail of the 
pipeline. It usually means the last handler in the pipeline did not handle the 
exception.
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
{noformat}

And again in another test:
{noformat}
WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 - 
An exceptionCaught() event was fired, and it reached at the tail of the 
pipeline. It usually means the last handler in the pipeline did not handle the 
exception.
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
{noformat}

This one looks also odd and makes 
upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test fail:

{noformat}
WARN  [epollEventLoopGroup-2-9] 2017-06-29 02:41:37,125 Slf4JLogger.java:151 - 
An exceptionCaught() event was fired, and it reached at the tail of the 
pipeline. It usually means the last handler in the pipeline did not handle the 
exception.
io.netty.handler.codec.DecoderException: 
org.apache.cassandra.transport.ProtocolException: Invalid or unsupported 
protocol version: 4
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:375)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:342)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:893)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:691) 
~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
 [netty-all-4.0.44.Final.jar:4.0.44.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:307) 
[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
 [netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
 [netty-all-4.0.44.Final.jar:4.0.44.Final]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: org.apache.cassandra.transport.ProtocolException: Invalid or 
unsupported protocol version: 4
at org.apache.cassandra.transport.Frame$Decoder.decode(Frame.java:186) 
~[main/:na]
at 

[jira] [Updated] (CASSANDRA-13629) Wait for batchlog replay during bootstrap

2017-06-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrés de la Peña updated CASSANDRA-13629:
--
Fix Version/s: 4.0

> Wait for batchlog replay during bootstrap
> -
>
> Key: CASSANDRA-13629
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13629
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Materialized Views
>Reporter: Andrés de la Peña
>Assignee: Andrés de la Peña
> Fix For: 4.0
>
>
> As part of the problem described in 
> [CASSANDRA-13162|https://issues.apache.org/jira/browse/CASSANDRA-13162], the 
> bootstrap logic won't wait for the backlogged batchlog to be fully replayed 
> before changing the new bootstrapping node to "UN" state. We should wait for 
> batchlog replay before making the node available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13629) Wait for batchlog replay during bootstrap

2017-06-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrés de la Peña updated CASSANDRA-13629:
--
Resolution: Not A Problem
Status: Resolved  (was: Awaiting Feedback)

> Wait for batchlog replay during bootstrap
> -
>
> Key: CASSANDRA-13629
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13629
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Materialized Views
>Reporter: Andrés de la Peña
>Assignee: Andrés de la Peña
>
> As part of the problem described in 
> [CASSANDRA-13162|https://issues.apache.org/jira/browse/CASSANDRA-13162], the 
> bootstrap logic won't wait for the backlogged batchlog to be fully replayed 
> before changing the new bootstrapping node to "UN" state. We should wait for 
> batchlog replay before making the node available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13629) Wait for batchlog replay during bootstrap

2017-06-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069680#comment-16069680
 ] 

Andrés de la Peña edited comment on CASSANDRA-13629 at 6/30/17 8:15 AM:


It seems that since 
[CASSANDRA-13065|https://issues.apache.org/jira/browse/CASSANDRA-13065] the 
data received during bootstrap is not sent to batchlog. Since the batchlog is 
empty when bootstrap finishes, this ticket is not necessary.


was (Author: adelapena):
It sees that since 
[CASSANDRA-13065|https://issues.apache.org/jira/browse/CASSANDRA-13065] the 
data received during bootstrap is not sent to batchlog. Since the batchlog is 
empty when bootstrap finishes, this ticket is not necessary.

> Wait for batchlog replay during bootstrap
> -
>
> Key: CASSANDRA-13629
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13629
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Materialized Views
>Reporter: Andrés de la Peña
>Assignee: Andrés de la Peña
>
> As part of the problem described in 
> [CASSANDRA-13162|https://issues.apache.org/jira/browse/CASSANDRA-13162], the 
> bootstrap logic won't wait for the backlogged batchlog to be fully replayed 
> before changing the new bootstrapping node to "UN" state. We should wait for 
> batchlog replay before making the node available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-13565) Materialized view usage of commit logs requires large mutation but commitlog_segment_size_in_mb=2048 causes exception

2017-06-30 Thread ZhaoYang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang resolved CASSANDRA-13565.
--
Resolution: Duplicate

> Materialized view usage of commit logs requires large mutation but 
> commitlog_segment_size_in_mb=2048 causes exception
> -
>
> Key: CASSANDRA-13565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration, Materialized Views, Streaming and 
> Messaging
> Environment: Cassandra 3.9.0, Windows 
>Reporter: Tania S Engel
> Attachments: CQLforTable.png
>
>
> We will be upgrading to 3.10 for CASSANDRA-11670. However, there is another 
> scenario (not applyunsafe during JOIN) which leads to :
>   java.lang.IllegalArgumentException: Mutation of 525.847MiB is too large 
> for the maximum size of 512.000MiB
>       at 
> org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.batchlog.BatchlogManager.store(BatchlogManager.java:147) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:797) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.buildKey(ViewBuilder.java:96) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.run(ViewBuilder.java:165) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1591)
>  [apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_66]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_66]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] 
> Due to the relationship of max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb, we increased commitlog_segment_size_in_mb and 
> left Cassandra to calculate max_mutation_size_in_kb as half the size 
> commitlog_segment_size_in_mb * 1024.
>  However, we have found that if we set commitlog_segment_size_in_mb=2048 we 
> get an exception upon starting Cassandra, when it is creating a new commit 
> log.
> ERROR [COMMIT-LOG-ALLOCATOR] 2017-05-31 17:01:48,005 
> JVMStabilityInspector.java:82 - Exiting due to error while processing commit 
> log during initialization.
> org.apache.cassandra.io.FSWriteError: java.io.IOException: An attempt was 
> made to move the file pointer before the beginning of the file
> Perhaps the index you are using is not big enough and it goes negative.
> Is the relationship between max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb important to preserve? In our limited stress 
> test we are finding mutation size already over 512mb and we expect more data 
> in our sstables and associated materialized views.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13629) Wait for batchlog replay during bootstrap

2017-06-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069680#comment-16069680
 ] 

Andrés de la Peña commented on CASSANDRA-13629:
---

It sees that since 
[CASSANDRA-13065|https://issues.apache.org/jira/browse/CASSANDRA-13065] the 
data received during bootstrap is not sent to batchlog. Since the batchlog is 
empty when bootstrap finishes, this ticket is not necessary.

> Wait for batchlog replay during bootstrap
> -
>
> Key: CASSANDRA-13629
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13629
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Materialized Views
>Reporter: Andrés de la Peña
>Assignee: Andrés de la Peña
>
> As part of the problem described in 
> [CASSANDRA-13162|https://issues.apache.org/jira/browse/CASSANDRA-13162], the 
> bootstrap logic won't wait for the backlogged batchlog to be fully replayed 
> before changing the new bootstrapping node to "UN" state. We should wait for 
> batchlog replay before making the node available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13565) Materialized view usage of commit logs requires large mutation but commitlog_segment_size_in_mb=2048 causes exception

2017-06-30 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069678#comment-16069678
 ] 

ZhaoYang commented on CASSANDRA-13565:
--

I will mark this ticket as `not an issue` and 13622 is better place to fix all 
boundary cases.

> Materialized view usage of commit logs requires large mutation but 
> commitlog_segment_size_in_mb=2048 causes exception
> -
>
> Key: CASSANDRA-13565
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13565
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration, Materialized Views, Streaming and 
> Messaging
> Environment: Cassandra 3.9.0, Windows 
>Reporter: Tania S Engel
> Attachments: CQLforTable.png
>
>
> We will be upgrading to 3.10 for CASSANDRA-11670. However, there is another 
> scenario (not applyunsafe during JOIN) which leads to :
>   java.lang.IllegalArgumentException: Mutation of 525.847MiB is too large 
> for the maximum size of 512.000MiB
>       at 
> org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.batchlog.BatchlogManager.store(BatchlogManager.java:147) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:797) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.buildKey(ViewBuilder.java:96) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.view.ViewBuilder.run(ViewBuilder.java:165) 
> ~[apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1591)
>  [apache-cassandra-3.9.0.jar:3.9.0]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_66]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_66]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_66]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] 
> Due to the relationship of max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb, we increased commitlog_segment_size_in_mb and 
> left Cassandra to calculate max_mutation_size_in_kb as half the size 
> commitlog_segment_size_in_mb * 1024.
>  However, we have found that if we set commitlog_segment_size_in_mb=2048 we 
> get an exception upon starting Cassandra, when it is creating a new commit 
> log.
> ERROR [COMMIT-LOG-ALLOCATOR] 2017-05-31 17:01:48,005 
> JVMStabilityInspector.java:82 - Exiting due to error while processing commit 
> log during initialization.
> org.apache.cassandra.io.FSWriteError: java.io.IOException: An attempt was 
> made to move the file pointer before the beginning of the file
> Perhaps the index you are using is not big enough and it goes negative.
> Is the relationship between max_mutation_size_in_kb and 
> commitlog_segment_size_in_mb important to preserve? In our limited stress 
> test we are finding mutation size already over 512mb and we expect more data 
> in our sstables and associated materialized views.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-06-30 Thread Stefan Podkowinski (JIRA)
Stefan Podkowinski created CASSANDRA-13649:
--

 Summary: Uncaught exceptions in Netty pipeline
 Key: CASSANDRA-13649
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
 Project: Cassandra
  Issue Type: Bug
Reporter: Stefan Podkowinski
 Attachments: test_stdout.txt

I've noticed some netty related errors in trunk in [some of the dtest 
results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
 Just want to make sure that we don't have to change anything related to the 
exception handling in our pipeline and that this isn't a netty issue. Actually 
if this causes flakiness but is otherwise harmless, we should do something 
about it, even if it's just on the dtest side.


{noformat}
WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 - 
An exceptionCaught() event was fired, and it reached at the tail of the 
pipeline. It usually means the last handler in the pipeline did not handle the 
exception.
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
{noformat}

And again in another test:
{noformat}
WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 - 
An exceptionCaught() event was fired, and it reached at the tail of the 
pipeline. It usually means the last handler in the pipeline did not handle the 
exception.
io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
Connection reset by peer
at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
{noformat}

This one looks also odd and makes 
upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test fail:

{noformat}
WARN  [epollEventLoopGroup-2-9] 2017-06-29 02:41:37,125 Slf4JLogger.java:151 - 
An exceptionCaught() event was fired, and it reached at the tail of the 
pipeline. It usually means the last handler in the pipeline did not handle the 
exception.
io.netty.handler.codec.DecoderException: 
org.apache.cassandra.transport.ProtocolException: Invalid or unsupported 
protocol version: 4
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:375)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:342)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:893)
 ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:691) 
~[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
 [netty-all-4.0.44.Final.jar:4.0.44.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:307) 
[netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
 [netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
 [netty-all-4.0.44.Final.jar:4.0.44.Final]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: org.apache.cassandra.transport.ProtocolException: Invalid or 
unsupported protocol version: 4
at 

[jira] [Comment Edited] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement

2017-06-30 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069565#comment-16069565
 ] 

ZhaoYang edited comment on CASSANDRA-13592 at 6/30/17 7:37 AM:
---

|| source || junit-result || dtest-result||
| [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | 
[junit|https://circleci.com/gh/jasonstack/cassandra/84]  | 
{{cql_tests.py:SlowQueryTester.local_query_test}} failed on trunk
{{bootstrap_test.TestBootstrap.simultaneous_bootstrap_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13506]
| 
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/82] | 
{{topology_test.TestTopology.size_estimates_multidc_test}}[known|https://issues.apache.org/jira/browse/CASSANDRA-13229]
{{cqlsh_tests.cqlsh_tests.TestCqlsh.test_describe}} 
[known|https://issues.apache.org/jira/browse/CASSANDRA-13250] | 
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/83] | | 
| 
[2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/85] | passed | 

1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer 
position the same.
2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with 
double-quote) to be consistent with user json input
3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, 
otherwise parent method throws NPE.


was (Author: jasonstack):
|| source || junit-result || dtest-result||
| [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | 
[junit|https://circleci.com/gh/jasonstack/cassandra/84] | | 
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/82] | | 
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/83] | | 
| 
[2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/85] | | 

1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer 
position the same.
2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with 
double-quote) to be consistent with user json input
3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, 
otherwise parent method throws NPE.

> Null Pointer exception at SELECT JSON statement
> ---
>
> Key: CASSANDRA-13592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13592
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Debian Linux
>Reporter: Wyss Philipp
>Assignee: ZhaoYang
>  Labels: beginner
> Attachments: system.log
>
>
> A Nulll pointer exception appears when the command
> {code}
> SELECT JSON * FROM examples.basic;
> ---MORE---
>  message="java.lang.NullPointerException">
> Examples.basic has the following description (DESC examples.basic;):
> CREATE TABLE examples.basic (
> key frozen> PRIMARY KEY,
> wert text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> {code}
> The error appears after the ---MORE--- line.
> The field "wert" has a JSON formatted string.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement

2017-06-30 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069565#comment-16069565
 ] 

ZhaoYang commented on CASSANDRA-13592:
--

|| source || junit-result || dtest-result||
| [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592] | 
[junit|https://circleci.com/gh/jasonstack/cassandra/84] | | 
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.11]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/82] | | 
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-3.0]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/83] | | 
| 
[2.2|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13592-cassandra-2.2]
 |  [junit|https://circleci.com/gh/jasonstack/cassandra/85] | | 

1. in {{listType, mapType, setType, TupleType}}.toJSONString(), keep buffer 
position the same.
2. change {{DurationType}}.toJSONString() to {{return "\"" + +"\"";}} (with 
double-quote) to be consistent with user json input
3. change {{EmptyType}}.toJSONString() to directly {{return "\"\"";}}, 
otherwise parent method throws NPE.

> Null Pointer exception at SELECT JSON statement
> ---
>
> Key: CASSANDRA-13592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13592
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Debian Linux
>Reporter: Wyss Philipp
>Assignee: ZhaoYang
>  Labels: beginner
> Attachments: system.log
>
>
> A Nulll pointer exception appears when the command
> {code}
> SELECT JSON * FROM examples.basic;
> ---MORE---
>  message="java.lang.NullPointerException">
> Examples.basic has the following description (DESC examples.basic;):
> CREATE TABLE examples.basic (
> key frozen> PRIMARY KEY,
> wert text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> {code}
> The error appears after the ---MORE--- line.
> The field "wert" has a JSON formatted string.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org