[jira] [Updated] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)
[ https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-13651: --- Labels: pull-request-available (was: ) > Large amount of CPU used by epoll_wait(.., .., .., 0) > - > > Key: CASSANDRA-13651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13651 > Project: Cassandra > Issue Type: Bug >Reporter: Corentin Chary >Assignee: Michael Burman >Priority: Major > Labels: pull-request-available > Fix For: 4.0 > > Attachments: cpu-usage.png > > > I was trying to profile Cassandra under my workload and I kept seeing this > backtrace: > {code} > epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms > io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java > (native) > io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) > Native.java:111 > io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) > EpollEventLoop.java:230 > io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 > io.netty.util.concurrent.SingleThreadEventExecutor$5.run() > SingleThreadEventExecutor.java:858 > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() > DefaultThreadFactory.java:138 > java.lang.Thread.run() Thread.java:745 > {code} > At fist I though that the profiler might not be able to profile native code > properly, but I wen't further and I realized that most of the CPU was used by > {{epoll_wait()}} calls with a timeout of zero. > Here is the output of perf on this system, which confirms that most of the > overhead was with timeout == 0. > {code} > Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): > 11594448 > Overhead Trace output > > ◆ > 90.06% epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, > timeout: 0x > ▒ >5.77% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x > ▒ >1.98% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x03e8 > ▒ >0.04% epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, > timeout: 0x > ▒ >0.04% epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, > timeout: 0x > ▒ >0.03% epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, > timeout: 0x > ▒ >0.02% epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, > timeout: 0x > {code} > Running this time with perf record -ag for call traces: > {code} > # Children Self sys usr Trace output > > # > > # > 8.61% 8.61% 0.00% 8.61% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x > | > ---0x1000200af313 >| > --8.61%--0x7fca6117bdac > 0x7fca60459804 > epoll_wait > 2.98% 2.98% 0.00% 2.98% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8 > | > ---0x1000200af313 >0x7fca6117b830 >0x7fca60459804 >epoll_wait > {code} > That looks like a lot of CPU used to wait for nothing. I'm not sure if pref > reports a per-CPU percentage or a per-system percentage, but that would be > still be 10% of the total CPU usage of Cassandra at the minimum. > I went further and found the code of all that: We schedule a lot of > {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but > netty+epoll only support timeouts above the milliseconds and will convert >
[jira] [Updated] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)
[ https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-13651: - Fix Version/s: (was: 4.x) 4.0 > Large amount of CPU used by epoll_wait(.., .., .., 0) > - > > Key: CASSANDRA-13651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13651 > Project: Cassandra > Issue Type: Bug >Reporter: Corentin Chary >Assignee: Michael Burman >Priority: Major > Fix For: 4.0 > > Attachments: cpu-usage.png > > > I was trying to profile Cassandra under my workload and I kept seeing this > backtrace: > {code} > epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms > io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java > (native) > io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) > Native.java:111 > io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) > EpollEventLoop.java:230 > io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 > io.netty.util.concurrent.SingleThreadEventExecutor$5.run() > SingleThreadEventExecutor.java:858 > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() > DefaultThreadFactory.java:138 > java.lang.Thread.run() Thread.java:745 > {code} > At fist I though that the profiler might not be able to profile native code > properly, but I wen't further and I realized that most of the CPU was used by > {{epoll_wait()}} calls with a timeout of zero. > Here is the output of perf on this system, which confirms that most of the > overhead was with timeout == 0. > {code} > Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): > 11594448 > Overhead Trace output > > ◆ > 90.06% epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, > timeout: 0x > ▒ >5.77% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x > ▒ >1.98% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x03e8 > ▒ >0.04% epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, > timeout: 0x > ▒ >0.04% epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, > timeout: 0x > ▒ >0.03% epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, > timeout: 0x > ▒ >0.02% epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, > timeout: 0x > {code} > Running this time with perf record -ag for call traces: > {code} > # Children Self sys usr Trace output > > # > > # > 8.61% 8.61% 0.00% 8.61% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x > | > ---0x1000200af313 >| > --8.61%--0x7fca6117bdac > 0x7fca60459804 > epoll_wait > 2.98% 2.98% 0.00% 2.98% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8 > | > ---0x1000200af313 >0x7fca6117b830 >0x7fca60459804 >epoll_wait > {code} > That looks like a lot of CPU used to wait for nothing. I'm not sure if pref > reports a per-CPU percentage or a per-system percentage, but that would be > still be 10% of the total CPU usage of Cassandra at the minimum. > I went further and found the code of all that: We schedule a lot of > {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but > netty+epoll only support timeouts above the milliseconds and will convert > everything bellow to 0. > I added some
[jira] [Updated] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)
[ https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-13651: - Reviewer: Benedict (was: Jason Brown) > Large amount of CPU used by epoll_wait(.., .., .., 0) > - > > Key: CASSANDRA-13651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13651 > Project: Cassandra > Issue Type: Bug >Reporter: Corentin Chary >Assignee: Corentin Chary >Priority: Major > Fix For: 4.x > > Attachments: cpu-usage.png > > > I was trying to profile Cassandra under my workload and I kept seeing this > backtrace: > {code} > epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms > io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java > (native) > io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) > Native.java:111 > io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) > EpollEventLoop.java:230 > io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 > io.netty.util.concurrent.SingleThreadEventExecutor$5.run() > SingleThreadEventExecutor.java:858 > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() > DefaultThreadFactory.java:138 > java.lang.Thread.run() Thread.java:745 > {code} > At fist I though that the profiler might not be able to profile native code > properly, but I wen't further and I realized that most of the CPU was used by > {{epoll_wait()}} calls with a timeout of zero. > Here is the output of perf on this system, which confirms that most of the > overhead was with timeout == 0. > {code} > Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): > 11594448 > Overhead Trace output > > ◆ > 90.06% epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, > timeout: 0x > ▒ >5.77% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x > ▒ >1.98% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x03e8 > ▒ >0.04% epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, > timeout: 0x > ▒ >0.04% epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, > timeout: 0x > ▒ >0.03% epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, > timeout: 0x > ▒ >0.02% epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, > timeout: 0x > {code} > Running this time with perf record -ag for call traces: > {code} > # Children Self sys usr Trace output > > # > > # > 8.61% 8.61% 0.00% 8.61% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x > | > ---0x1000200af313 >| > --8.61%--0x7fca6117bdac > 0x7fca60459804 > epoll_wait > 2.98% 2.98% 0.00% 2.98% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8 > | > ---0x1000200af313 >0x7fca6117b830 >0x7fca60459804 >epoll_wait > {code} > That looks like a lot of CPU used to wait for nothing. I'm not sure if pref > reports a per-CPU percentage or a per-system percentage, but that would be > still be 10% of the total CPU usage of Cassandra at the minimum. > I went further and found the code of all that: We schedule a lot of > {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but > netty+epoll only support timeouts above the milliseconds and will convert > everything bellow to 0. > I added some traces to netty
[jira] [Updated] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)
[ https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-13651: Reviewer: Jason Brown > Large amount of CPU used by epoll_wait(.., .., .., 0) > - > > Key: CASSANDRA-13651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13651 > Project: Cassandra > Issue Type: Bug >Reporter: Corentin Chary >Assignee: Corentin Chary > Fix For: 4.x > > Attachments: cpu-usage.png > > > I was trying to profile Cassandra under my workload and I kept seeing this > backtrace: > {code} > epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms > io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java > (native) > io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) > Native.java:111 > io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) > EpollEventLoop.java:230 > io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 > io.netty.util.concurrent.SingleThreadEventExecutor$5.run() > SingleThreadEventExecutor.java:858 > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() > DefaultThreadFactory.java:138 > java.lang.Thread.run() Thread.java:745 > {code} > At fist I though that the profiler might not be able to profile native code > properly, but I wen't further and I realized that most of the CPU was used by > {{epoll_wait()}} calls with a timeout of zero. > Here is the output of perf on this system, which confirms that most of the > overhead was with timeout == 0. > {code} > Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): > 11594448 > Overhead Trace output > > ◆ > 90.06% epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, > timeout: 0x > ▒ >5.77% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x > ▒ >1.98% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x03e8 > ▒ >0.04% epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, > timeout: 0x > ▒ >0.04% epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, > timeout: 0x > ▒ >0.03% epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, > timeout: 0x > ▒ >0.02% epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, > timeout: 0x > {code} > Running this time with perf record -ag for call traces: > {code} > # Children Self sys usr Trace output > > # > > # > 8.61% 8.61% 0.00% 8.61% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x > | > ---0x1000200af313 >| > --8.61%--0x7fca6117bdac > 0x7fca60459804 > epoll_wait > 2.98% 2.98% 0.00% 2.98% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8 > | > ---0x1000200af313 >0x7fca6117b830 >0x7fca60459804 >epoll_wait > {code} > That looks like a lot of CPU used to wait for nothing. I'm not sure if pref > reports a per-CPU percentage or a per-system percentage, but that would be > still be 10% of the total CPU usage of Cassandra at the minimum. > I went further and found the code of all that: We schedule a lot of > {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but > netty+epoll only support timeouts above the milliseconds and will convert > everything bellow to 0. > I added some traces to netty (4.1): > {code} > diff --git >
[jira] [Updated] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)
[ https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corentin Chary updated CASSANDRA-13651: --- Attachment: cpu-usage.png !cpu-usage.png|thumbnail! > Large amount of CPU used by epoll_wait(.., .., .., 0) > - > > Key: CASSANDRA-13651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13651 > Project: Cassandra > Issue Type: Bug >Reporter: Corentin Chary > Fix For: 4.x > > Attachments: cpu-usage.png > > > I was trying to profile Cassandra under my workload and I kept seeing this > backtrace: > {code} > epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms > io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java > (native) > io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) > Native.java:111 > io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) > EpollEventLoop.java:230 > io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 > io.netty.util.concurrent.SingleThreadEventExecutor$5.run() > SingleThreadEventExecutor.java:858 > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() > DefaultThreadFactory.java:138 > java.lang.Thread.run() Thread.java:745 > {code} > At fist I though that the profiler might not be able to profile native code > properly, but I wen't further and I realized that most of the CPU was used by > {{epoll_wait()}} calls with a timeout of zero. > Here is the output of perf on this system, which confirms that most of the > overhead was with timeout == 0. > {code} > Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): > 11594448 > Overhead Trace output > > ◆ > 90.06% epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, > timeout: 0x > ▒ >5.77% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x > ▒ >1.98% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, > timeout: 0x03e8 > ▒ >0.04% epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, > timeout: 0x > ▒ >0.04% epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, > timeout: 0x > ▒ >0.03% epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, > timeout: 0x > ▒ >0.02% epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, > timeout: 0x > {code} > Running this time with perf record -ag for call traces: > {code} > # Children Self sys usr Trace output > > # > > # > 8.61% 8.61% 0.00% 8.61% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x > | > ---0x1000200af313 >| > --8.61%--0x7fca6117bdac > 0x7fca60459804 > epoll_wait > 2.98% 2.98% 0.00% 2.98% epfd: 0x00a7, events: > 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8 > | > ---0x1000200af313 >0x7fca6117b830 >0x7fca60459804 >epoll_wait > {code} > That looks like a lot of CPU used to wait for nothing. I'm not sure if pref > reports a per-CPU percentage or a per-system percentage, but that would be > still be 10% of the total CPU usage of Cassandra at the minimum. > I went further and found the code of all that: We schedule a lot of > {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but > netty+epoll only support timeouts above the milliseconds and will convert > everything bellow to 0. > I added some traces to netty (4.1): > {code} > diff --git >
[jira] [Updated] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)
[ https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corentin Chary updated CASSANDRA-13651: --- Description: I was trying to profile Cassandra under my workload and I kept seeing this backtrace: {code} epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java (native) io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) Native.java:111 io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) EpollEventLoop.java:230 io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254 io.netty.util.concurrent.SingleThreadEventExecutor$5.run() SingleThreadEventExecutor.java:858 io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() DefaultThreadFactory.java:138 java.lang.Thread.run() Thread.java:745 {code} At fist I though that the profiler might not be able to profile native code properly, but I wen't further and I realized that most of the CPU was used by {{epoll_wait()}} calls with a timeout of zero. Here is the output of perf on this system, which confirms that most of the overhead was with timeout == 0. {code} Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 11594448 Overhead Trace output ◆ 90.06% epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, timeout: 0x ▒ 5.77% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, timeout: 0x ▒ 1.98% epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, timeout: 0x03e8 ▒ 0.04% epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, timeout: 0x ▒ 0.04% epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, timeout: 0x ▒ 0.03% epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, timeout: 0x ▒ 0.02% epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, timeout: 0x {code} Running this time with perf record -ag for call traces: {code} # Children Self sys usr Trace output # # 8.61% 8.61% 0.00% 8.61% epfd: 0x00a7, events: 0x7fca452d6000, maxevents: 0x1000, timeout: 0x | ---0x1000200af313 | --8.61%--0x7fca6117bdac 0x7fca60459804 epoll_wait 2.98% 2.98% 0.00% 2.98% epfd: 0x00a7, events: 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8 | ---0x1000200af313 0x7fca6117b830 0x7fca60459804 epoll_wait {code} That looks like a lot of CPU used to wait for nothing. I'm not sure if pref reports a per-CPU percentage or a per-system percentage, but that would be still be 10% of the total CPU usage of Cassandra at the minimum. I went further and found the code of all that: We schedule a lot of {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but netty+epoll only support timeouts above the milliseconds and will convert everything bellow to 0. I added some traces to netty (4.1): {code} diff --git a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java index 909088fde..8734bbfd4 100644 --- a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java +++ b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java @@ -208,10 +208,15 @@ final class EpollEventLoop extends SingleThreadEventLoop { long currentTimeNanos = System.nanoTime(); long selectDeadLineNanos = currentTimeNanos + delayNanos(currentTimeNanos); for (;;)