[ 
https://issues.apache.org/jira/browse/CASSANDRA-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239181#comment-17239181
 ] 

Benjamin Lerer edited comment on CASSANDRA-16259 at 11/26/20, 1:10 PM:
-----------------------------------------------------------------------

While working on a fix for this bug I hit another issue that impact the 3.0, 
3.X and 4.0 versions. On some scenario the number of cells for a row is counted 
twice.

We count the number of cells per partition using the 
{{BigTableWriter.StatsCollector}} which will increment the number of cells when 
{{onPartitionClose}} is called. {{onPartitionClose}} is called when the 
transformed {{UnfilteredRowIterator}} is closed by the {{try-with-resources}} 
in 
[BigTableWriter.append|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java#L214].
 Unfortunately, If the {{UnfilteredRowIterator}} is an instance of 
{{UnfilteredRows}} 
[Transformation.apply|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/transform/Transformation.java#L138]
 will not create a wrapper around the {{UnfilteredRowIterator}} but will simply 
add the {{Transformation}} to the list of transformations associated to that 
iterator.
When a memtable is flush the {{UnfilteredRowIterator}} is opened in a 
{{try-with-resources}} in 
[FlushRunnable:writeSortedContent|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Memtable.java#L473]
 before being passed to {{BigTableWriter|append}}. If the  
{{UnfilteredRowIterator}} is an instance of {{UnfilteredRows}}, 
{{StatsCollector#onPartitionClose}} will be called twice and the cells for the 
row will be counted twice.

The patch contains a change that ensure that we do not hit that problem by 
ensuring that if we call {{close}} on a class inheriting from {{BaseIterator}} 
only the first close will be processed and the following one will be ignored.

|| Branches || Test results ||
| [trunk|https://github.com/apache/cassandra/pull/835] | 
[j8|https://app.circleci.com/pipelines/github/blerer/cassandra/72/workflows/77c51c04-5a76-4530-99bd-cb1604f58931],
 
[j11|https://app.circleci.com/pipelines/github/blerer/cassandra/72/workflows/ab7be933-239c-41fb-a907-fa76a859ed99]
 |
| [3.11|https://github.com/apache/cassandra/pull/836] | 
[j8|https://app.circleci.com/pipelines/github/blerer/cassandra/69/workflows/9fb4aca3-3b41-4798-a921-07b4c7026b7e]
 |
| [3.0|https://github.com/apache/cassandra/pull/837] | 
[j8|https://app.circleci.com/pipelines/github/blerer/cassandra/70/workflows/b0ef5b25-0a4f-4ddb-92cd-b4c76f7908ea]
 |
| [2.2|https://github.com/apache/cassandra/pull/838] | 
[j8|https://app.circleci.com/pipelines/github/blerer/cassandra/71/workflows/d24c6595-9c8c-4893-86e0-ef3b15966f02]
 |
 



was (Author: blerer):
While working on a fix for this bug I hit another issue that impact the 3.0, 
3.X and 4.0 versions. On some scenario the number of cells for a row is counted 
twice.

We count the number of cells per partition using the 
{{BigTableWriter.StatsCollector}} which will increment the number of cells when 
{{onPartitionClose}} is called. {{onPartitionClose}} is called when the 
transformed {{UnfilteredRowIterator}} is closed by the {{try-with-resources}} 
in 
[BigTableWriter.append|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java#L214].
 Unfortunately, If the {{UnfilteredRowIterator}} is an instance of 
{{UnfilteredRows}} 
[Transformation.apply|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/transform/Transformation.java#L138]
 will not create a wrapper around the {{UnfilteredRowIterator}} but will simply 
add the {{Transformation}} to the list of transformations associated to that 
iterator.
When a memtable is flush the {{UnfilteredRowIterator}} is opened in a 
{{try-with-resources}} in 
[FlushRunnable:writeSortedContent|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Memtable.java#L473]
 before being passed to {{BigTableWriter|append}}. If the  
{{UnfilteredRowIterator}} is an instance of {{UnfilteredRows}}, 
{{StatsCollector#onPartitionClose}} will be called twice and the cells for the 
row will be counted twice.

The patch contains a change that ensure that we do not hit that problem by 
ensuring that if we call {{close}} on a class inheriting from {{BaseIterator}} 
only the first close will be processed and the following one will be ignored.

|| Branches || Test results ||
| [trunk|https://github.com/apache/cassandra/pull/835] | 
[j8|https://app.circleci.com/pipelines/github/blerer/cassandra/72/workflows/77c51c04-5a76-4530-99bd-cb1604f5893],
 
[j11|https://app.circleci.com/pipelines/github/blerer/cassandra/72/workflows/ab7be933-239c-41fb-a907-fa76a859ed99]
 |
| [3.11|https://github.com/apache/cassandra/pull/836] | 
[j8|https://app.circleci.com/pipelines/github/blerer/cassandra/69/workflows/9fb4aca3-3b41-4798-a921-07b4c7026b7e]
 |
| [3.0|https://github.com/apache/cassandra/pull/837] | 
[j8|https://app.circleci.com/pipelines/github/blerer/cassandra/70/workflows/b0ef5b25-0a4f-4ddb-92cd-b4c76f7908ea]
 |
| [2.2|https://github.com/apache/cassandra/pull/838] | 
[j8|https://app.circleci.com/pipelines/github/blerer/cassandra/71/workflows/d24c6595-9c8c-4893-86e0-ef3b15966f02]
 |
 


> tablehistograms cause ArrayIndexOutOfBoundsException
> ----------------------------------------------------
>
>                 Key: CASSANDRA-16259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16259
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Observability/Metrics
>            Reporter: Justin Montgomery
>            Assignee: Benjamin Lerer
>            Priority: Normal
>             Fix For: 2.2.x, 3.0.x, 3.11.x, 4.0-beta
>
>
> After upgrading some nodes in our cluster from 3.11.8 to 3.11.9 an error 
> appeared on the upgraded nodes when trying to access *tablehistograms*. The 
> same command run on our .8 nodes return as expected, only the upgraded .9 
> nodes fail. Not all tables fail when queried, but about 90% of them do.
> We use Datastax MCAC which appears to query histograms every 30 seconds, this 
> outputs to the system.log:
> {noformat}
> WARN  [insights-3-1] 2020-11-09 01:11:22,331 UnixSocketClient.java:830 - 
> Error reporting:
> java.lang.ArrayIndexOutOfBoundsException: 115
>     at 
> org.apache.cassandra.metrics.TableMetrics.combineHistograms(TableMetrics.java:261)
>  ~[apache-cassandra-3.11.9.jar:3.11.9]
>     at 
> org.apache.cassandra.metrics.TableMetrics.access$000(TableMetrics.java:48) 
> ~[apache-cassandra-3.11.9.jar:3.11.9]
>     at 
> org.apache.cassandra.metrics.TableMetrics$11.getValue(TableMetrics.java:376) 
> ~[apache-cassandra-3.11.9.jar:3.11.9]
>     at 
> org.apache.cassandra.metrics.TableMetrics$11.getValue(TableMetrics.java:373) 
> ~[apache-cassandra-3.11.9.jar:3.11.9]
>     at 
> com.datastax.mcac.UnixSocketClient.writeMetric(UnixSocketClient.java:839) 
> [datastax-mcac-agent.jar:na]
>     at 
> com.datastax.mcac.UnixSocketClient.access$700(UnixSocketClient.java:78) 
> [datastax-mcac-agent.jar:na]
>     at 
> com.datastax.mcac.UnixSocketClient$2.lambda$onGaugeAdded$0(UnixSocketClient.java:626)
>  ~[datastax-mcac-agent.jar:na]
>     at 
> com.datastax.mcac.UnixSocketClient.writeGroup(UnixSocketClient.java:819) 
> [datastax-mcac-agent.jar:na]
>     at 
> com.datastax.mcac.UnixSocketClient.lambda$restartMetricReporting$2(UnixSocketClient.java:798)
>  [datastax-mcac-agent.jar:na]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_272]
>     at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:126)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>     at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>     at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:307) 
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>     at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>     at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
>  ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>     at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_272]{noformat}
> Manually trying a histogram from the CLI:
> {noformat}
> $ nodetool tablehistograms logdata log_height_index
> error: 115
> -- StackTrace --
> java.lang.ArrayIndexOutOfBoundsException: 115
>       at 
> org.apache.cassandra.metrics.TableMetrics.combineHistograms(TableMetrics.java:261)
>       at 
> org.apache.cassandra.metrics.TableMetrics.access$000(TableMetrics.java:48)
>       at 
> org.apache.cassandra.metrics.TableMetrics$11.getValue(TableMetrics.java:376)
>       at 
> org.apache.cassandra.metrics.TableMetrics$11.getValue(TableMetrics.java:373)
>       at 
> org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxGauge.getValue(CassandraMetricsRegistry.java:250)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
>       at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>       at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>       at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>       at 
> com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
>       at 
> com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
>       at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
>       at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
>       at 
> com.sun.jmx.remote.security.MBeanServerAccessController.getAttribute(MBeanServerAccessController.java:320)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1445)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:639)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>       at sun.rmi.transport.Transport$1.run(Transport.java:200)
>       at sun.rmi.transport.Transport$1.run(Transport.java:197)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>       at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to