[jira] [Commented] (CASSANDRA-5431) cassandra-shuffle with JMX usernames and passwords
[ https://issues.apache.org/jira/browse/CASSANDRA-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625225#comment-13625225 ] Michał Michalski commented on CASSANDRA-5431: - As far as I remember I wrote a patch for this some time ago when experimenting a bit with switching to vnodes. If you did not start to work on this task (so I won't double your work ;-) ), I'll check it later today. cassandra-shuffle with JMX usernames and passwords --- Key: CASSANDRA-5431 URL: https://issues.apache.org/jira/browse/CASSANDRA-5431 Project: Cassandra Issue Type: Bug Affects Versions: 1.2.3 Reporter: Eric Dong Attachments: CASSANDRA-5431-whitespace.patch Unlike nodetool, cassandra-shuffle doesn't allow passing in a JMX username and password. This stops those who want to switch to vnodes from doing so if JMX access requires a username and a password. Patch to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5431) cassandra-shuffle with JMX usernames and passwords
[ https://issues.apache.org/jira/browse/CASSANDRA-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michał Michalski updated CASSANDRA-5431: Attachment: 5431-v2.txt Yup, I have it (updated to make it apply on your whitespace patch). cassandra-shuffle with JMX usernames and passwords --- Key: CASSANDRA-5431 URL: https://issues.apache.org/jira/browse/CASSANDRA-5431 Project: Cassandra Issue Type: Bug Affects Versions: 1.2.3 Reporter: Eric Dong Attachments: 5431-v2.txt, CASSANDRA-5431-whitespace.patch Unlike nodetool, cassandra-shuffle doesn't allow passing in a JMX username and password. This stops those who want to switch to vnodes from doing so if JMX access requires a username and a password. Patch to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5431) cassandra-shuffle with JMX usernames and passwords
[ https://issues.apache.org/jira/browse/CASSANDRA-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michał Michalski updated CASSANDRA-5431: Attachment: 5431-v3.txt OK. Attaching merged patch. cassandra-shuffle with JMX usernames and passwords --- Key: CASSANDRA-5431 URL: https://issues.apache.org/jira/browse/CASSANDRA-5431 Project: Cassandra Issue Type: Bug Affects Versions: 1.2.3 Reporter: Eric Dong Attachments: 5431-v2.txt, 5431-v3.txt, CASSANDRA-5431-whitespace.patch Unlike nodetool, cassandra-shuffle doesn't allow passing in a JMX username and password. This stops those who want to switch to vnodes from doing so if JMX access requires a username and a password. Patch to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-7575) Custom 2i validation
[ https://issues.apache.org/jira/browse/CASSANDRA-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrés de la Peña updated CASSANDRA-7575: - Attachment: 2i_validation_v2.patch Custom 2i validation Key: CASSANDRA-7575 URL: https://issues.apache.org/jira/browse/CASSANDRA-7575 Project: Cassandra Issue Type: Improvement Components: API Reporter: Andrés de la Peña Assignee: Andrés de la Peña Priority: Minor Labels: 2i, cql3, secondaryIndex, secondary_index, select Fix For: 2.1.1 Attachments: 2i_validation.patch, 2i_validation_v2.patch There are several projects using custom secondary indexes as an extension point to integrate C* with other systems such as Solr or Lucene. The usual approach is to embed third party indexing queries in CQL clauses. For example, [DSE Search|http://www.datastax.com/what-we-offer/products-services/datastax-enterprise] embeds Solr syntax this way: {code} SELECT title FROM solr WHERE solr_query='title:natio*'; {code} [Stratio platform|https://github.com/Stratio/stratio-cassandra] embeds custom JSON syntax for searching in Lucene indexes: {code} SELECT * FROM tweets WHERE lucene='{ filter : { type: range, field: time, lower: 2014/04/25, upper: 2014/04/1 }, query : { type: phrase, field: body, values: [big, data] }, sort : {fields: [ {field:time, reverse:true} ] } }'; {code} Tuplejump [Stargate|http://tuplejump.github.io/stargate/] also uses the Stratio's open source JSON syntax: {code} SELECT name,company FROM PERSON WHERE stargate ='{ filter: { type: range, field: company, lower: a, upper: p }, sort:{ fields: [{field:name,reverse:true}] } }'; {code} These syntaxes are validated by the corresponding 2i implementation. This validation is done behind the StorageProxy command distribution. So, far as I know, there is no way to give rich feedback about syntax errors to CQL users. I'm uploading a patch with some changes trying to improve this. I propose adding an empty validation method to SecondaryIndexSearcher that can be overridden by custom 2i implementations: {code} public void validate(ListIndexExpression clause) {} {code} And call it from SelectStatement#getRangeCommand: {code} ColumnFamilyStore cfs = Keyspace.open(keyspace()).getColumnFamilyStore(columnFamily()); for (SecondaryIndexSearcher searcher : cfs.indexManager.getIndexSearchersForQuery(expressions)) { try { searcher.validate(expressions); } catch (RuntimeException e) { String exceptionMessage = e.getMessage(); if (exceptionMessage != null !exceptionMessage.trim().isEmpty()) throw new InvalidRequestException( Invalid index expression: + e.getMessage()); else throw new InvalidRequestException( Invalid index expression); } } {code} In this way C* allows custom 2i implementations to give feedback about syntax errors. We are currently using these changes in a fork with no problems. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7575) Custom 2i validation
[ https://issues.apache.org/jira/browse/CASSANDRA-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080783#comment-14080783 ] Andrés de la Peña commented on CASSANDRA-7575: -- [~sbtourist], here is the patch with the suggested changes. We have run some functional tests and it works fine. Custom 2i validation Key: CASSANDRA-7575 URL: https://issues.apache.org/jira/browse/CASSANDRA-7575 Project: Cassandra Issue Type: Improvement Components: API Reporter: Andrés de la Peña Assignee: Andrés de la Peña Priority: Minor Labels: 2i, cql3, secondaryIndex, secondary_index, select Fix For: 2.1.1 Attachments: 2i_validation.patch, 2i_validation_v2.patch There are several projects using custom secondary indexes as an extension point to integrate C* with other systems such as Solr or Lucene. The usual approach is to embed third party indexing queries in CQL clauses. For example, [DSE Search|http://www.datastax.com/what-we-offer/products-services/datastax-enterprise] embeds Solr syntax this way: {code} SELECT title FROM solr WHERE solr_query='title:natio*'; {code} [Stratio platform|https://github.com/Stratio/stratio-cassandra] embeds custom JSON syntax for searching in Lucene indexes: {code} SELECT * FROM tweets WHERE lucene='{ filter : { type: range, field: time, lower: 2014/04/25, upper: 2014/04/1 }, query : { type: phrase, field: body, values: [big, data] }, sort : {fields: [ {field:time, reverse:true} ] } }'; {code} Tuplejump [Stargate|http://tuplejump.github.io/stargate/] also uses the Stratio's open source JSON syntax: {code} SELECT name,company FROM PERSON WHERE stargate ='{ filter: { type: range, field: company, lower: a, upper: p }, sort:{ fields: [{field:name,reverse:true}] } }'; {code} These syntaxes are validated by the corresponding 2i implementation. This validation is done behind the StorageProxy command distribution. So, far as I know, there is no way to give rich feedback about syntax errors to CQL users. I'm uploading a patch with some changes trying to improve this. I propose adding an empty validation method to SecondaryIndexSearcher that can be overridden by custom 2i implementations: {code} public void validate(ListIndexExpression clause) {} {code} And call it from SelectStatement#getRangeCommand: {code} ColumnFamilyStore cfs = Keyspace.open(keyspace()).getColumnFamilyStore(columnFamily()); for (SecondaryIndexSearcher searcher : cfs.indexManager.getIndexSearchersForQuery(expressions)) { try { searcher.validate(expressions); } catch (RuntimeException e) { String exceptionMessage = e.getMessage(); if (exceptionMessage != null !exceptionMessage.trim().isEmpty()) throw new InvalidRequestException( Invalid index expression: + e.getMessage()); else throw new InvalidRequestException( Invalid index expression); } } {code} In this way C* allows custom 2i implementations to give feedback about syntax errors. We are currently using these changes in a fork with no problems. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7575) Custom 2i validation
[ https://issues.apache.org/jira/browse/CASSANDRA-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrés de la Peña updated CASSANDRA-7575: - Attachment: 2i_validation_v3.patch Custom 2i validation Key: CASSANDRA-7575 URL: https://issues.apache.org/jira/browse/CASSANDRA-7575 Project: Cassandra Issue Type: Improvement Components: API Reporter: Andrés de la Peña Assignee: Andrés de la Peña Priority: Minor Labels: 2i, cql3, secondaryIndex, secondary_index, select Fix For: 2.1.1 Attachments: 2i_validation.patch, 2i_validation_v2.patch, 2i_validation_v3.patch There are several projects using custom secondary indexes as an extension point to integrate C* with other systems such as Solr or Lucene. The usual approach is to embed third party indexing queries in CQL clauses. For example, [DSE Search|http://www.datastax.com/what-we-offer/products-services/datastax-enterprise] embeds Solr syntax this way: {code} SELECT title FROM solr WHERE solr_query='title:natio*'; {code} [Stratio platform|https://github.com/Stratio/stratio-cassandra] embeds custom JSON syntax for searching in Lucene indexes: {code} SELECT * FROM tweets WHERE lucene='{ filter : { type: range, field: time, lower: 2014/04/25, upper: 2014/04/1 }, query : { type: phrase, field: body, values: [big, data] }, sort : {fields: [ {field:time, reverse:true} ] } }'; {code} Tuplejump [Stargate|http://tuplejump.github.io/stargate/] also uses the Stratio's open source JSON syntax: {code} SELECT name,company FROM PERSON WHERE stargate ='{ filter: { type: range, field: company, lower: a, upper: p }, sort:{ fields: [{field:name,reverse:true}] } }'; {code} These syntaxes are validated by the corresponding 2i implementation. This validation is done behind the StorageProxy command distribution. So, far as I know, there is no way to give rich feedback about syntax errors to CQL users. I'm uploading a patch with some changes trying to improve this. I propose adding an empty validation method to SecondaryIndexSearcher that can be overridden by custom 2i implementations: {code} public void validate(ListIndexExpression clause) {} {code} And call it from SelectStatement#getRangeCommand: {code} ColumnFamilyStore cfs = Keyspace.open(keyspace()).getColumnFamilyStore(columnFamily()); for (SecondaryIndexSearcher searcher : cfs.indexManager.getIndexSearchersForQuery(expressions)) { try { searcher.validate(expressions); } catch (RuntimeException e) { String exceptionMessage = e.getMessage(); if (exceptionMessage != null !exceptionMessage.trim().isEmpty()) throw new InvalidRequestException( Invalid index expression: + e.getMessage()); else throw new InvalidRequestException( Invalid index expression); } } {code} In this way C* allows custom 2i implementations to give feedback about syntax errors. We are currently using these changes in a fork with no problems. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7575) Custom 2i validation
[ https://issues.apache.org/jira/browse/CASSANDRA-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082204#comment-14082204 ] Andrés de la Peña commented on CASSANDRA-7575: -- [~sbtourist], I suppose that the problem is due to trailing white spaces in the patch file. I'm uploading a new version without trailing whitespaces. These are the steps I've followed to apply the patch without warnings: {code} git clone https://github.com/apache/cassandra.git git checkout cassandra-2.1 git apply 2i_validation_v3.patch {code} Sorry for the inconvenience. Custom 2i validation Key: CASSANDRA-7575 URL: https://issues.apache.org/jira/browse/CASSANDRA-7575 Project: Cassandra Issue Type: Improvement Components: API Reporter: Andrés de la Peña Assignee: Andrés de la Peña Priority: Minor Labels: 2i, cql3, secondaryIndex, secondary_index, select Fix For: 2.1.1 Attachments: 2i_validation.patch, 2i_validation_v2.patch, 2i_validation_v3.patch There are several projects using custom secondary indexes as an extension point to integrate C* with other systems such as Solr or Lucene. The usual approach is to embed third party indexing queries in CQL clauses. For example, [DSE Search|http://www.datastax.com/what-we-offer/products-services/datastax-enterprise] embeds Solr syntax this way: {code} SELECT title FROM solr WHERE solr_query='title:natio*'; {code} [Stratio platform|https://github.com/Stratio/stratio-cassandra] embeds custom JSON syntax for searching in Lucene indexes: {code} SELECT * FROM tweets WHERE lucene='{ filter : { type: range, field: time, lower: 2014/04/25, upper: 2014/04/1 }, query : { type: phrase, field: body, values: [big, data] }, sort : {fields: [ {field:time, reverse:true} ] } }'; {code} Tuplejump [Stargate|http://tuplejump.github.io/stargate/] also uses the Stratio's open source JSON syntax: {code} SELECT name,company FROM PERSON WHERE stargate ='{ filter: { type: range, field: company, lower: a, upper: p }, sort:{ fields: [{field:name,reverse:true}] } }'; {code} These syntaxes are validated by the corresponding 2i implementation. This validation is done behind the StorageProxy command distribution. So, far as I know, there is no way to give rich feedback about syntax errors to CQL users. I'm uploading a patch with some changes trying to improve this. I propose adding an empty validation method to SecondaryIndexSearcher that can be overridden by custom 2i implementations: {code} public void validate(ListIndexExpression clause) {} {code} And call it from SelectStatement#getRangeCommand: {code} ColumnFamilyStore cfs = Keyspace.open(keyspace()).getColumnFamilyStore(columnFamily()); for (SecondaryIndexSearcher searcher : cfs.indexManager.getIndexSearchersForQuery(expressions)) { try { searcher.validate(expressions); } catch (RuntimeException e) { String exceptionMessage = e.getMessage(); if (exceptionMessage != null !exceptionMessage.trim().isEmpty()) throw new InvalidRequestException( Invalid index expression: + e.getMessage()); else throw new InvalidRequestException( Invalid index expression); } } {code} In this way C* allows custom 2i implementations to give feedback about syntax errors. We are currently using these changes in a fork with no problems. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7575) Custom 2i validation
[ https://issues.apache.org/jira/browse/CASSANDRA-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082699#comment-14082699 ] Andrés de la Peña commented on CASSANDRA-7575: -- That's fine with me. Custom 2i validation Key: CASSANDRA-7575 URL: https://issues.apache.org/jira/browse/CASSANDRA-7575 Project: Cassandra Issue Type: Improvement Components: API Reporter: Andrés de la Peña Assignee: Andrés de la Peña Priority: Minor Labels: 2i, cql3, secondaryIndex, secondary_index, select Fix For: 2.1.1 Attachments: 2i_validation.patch, 2i_validation_v2.patch, 2i_validation_v3.patch, 2i_validation_v4.patch There are several projects using custom secondary indexes as an extension point to integrate C* with other systems such as Solr or Lucene. The usual approach is to embed third party indexing queries in CQL clauses. For example, [DSE Search|http://www.datastax.com/what-we-offer/products-services/datastax-enterprise] embeds Solr syntax this way: {code} SELECT title FROM solr WHERE solr_query='title:natio*'; {code} [Stratio platform|https://github.com/Stratio/stratio-cassandra] embeds custom JSON syntax for searching in Lucene indexes: {code} SELECT * FROM tweets WHERE lucene='{ filter : { type: range, field: time, lower: 2014/04/25, upper: 2014/04/1 }, query : { type: phrase, field: body, values: [big, data] }, sort : {fields: [ {field:time, reverse:true} ] } }'; {code} Tuplejump [Stargate|http://tuplejump.github.io/stargate/] also uses the Stratio's open source JSON syntax: {code} SELECT name,company FROM PERSON WHERE stargate ='{ filter: { type: range, field: company, lower: a, upper: p }, sort:{ fields: [{field:name,reverse:true}] } }'; {code} These syntaxes are validated by the corresponding 2i implementation. This validation is done behind the StorageProxy command distribution. So, far as I know, there is no way to give rich feedback about syntax errors to CQL users. I'm uploading a patch with some changes trying to improve this. I propose adding an empty validation method to SecondaryIndexSearcher that can be overridden by custom 2i implementations: {code} public void validate(ListIndexExpression clause) {} {code} And call it from SelectStatement#getRangeCommand: {code} ColumnFamilyStore cfs = Keyspace.open(keyspace()).getColumnFamilyStore(columnFamily()); for (SecondaryIndexSearcher searcher : cfs.indexManager.getIndexSearchersForQuery(expressions)) { try { searcher.validate(expressions); } catch (RuntimeException e) { String exceptionMessage = e.getMessage(); if (exceptionMessage != null !exceptionMessage.trim().isEmpty()) throw new InvalidRequestException( Invalid index expression: + e.getMessage()); else throw new InvalidRequestException( Invalid index expression); } } {code} In this way C* allows custom 2i implementations to give feedback about syntax errors. We are currently using these changes in a fork with no problems. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7595) EmbeddedCassandraService class should provide a stop method
[ https://issues.apache.org/jira/browse/CASSANDRA-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084442#comment-14084442 ] Mirko Tschäni commented on CASSANDRA-7595: -- The solution proposed in 7595.diff is exactly what we would like to have. Calling CasandraDaemon#deactivate instead of stop is perfect. EmbeddedCassandraService class should provide a stop method --- Key: CASSANDRA-7595 URL: https://issues.apache.org/jira/browse/CASSANDRA-7595 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Mirko Tschäni Assignee: Mirko Tschäni Priority: Minor Fix For: 1.2.19, 2.0.10, 2.1.1 Attachments: 7595.diff The EmbeddedCassandraService does only provide a start method. It should also provide a stop method. We use EmbeddedCassandraService to embed cassandra in a osgi application and need to be able to shut down cassandra so that no non daemon threads remain. Implementation would be straight forward: add the following method to EmbeddedCassandraService: public void stop() { cassandraDaemon.stop(); } I have tested this implementation locally and it worked as expected (stops all non daemon threads). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7710) nodetool ring throws exception if run on machine without Cassandra
Jimmy Mårdell created CASSANDRA-7710: Summary: nodetool ring throws exception if run on machine without Cassandra Key: CASSANDRA-7710 URL: https://issues.apache.org/jira/browse/CASSANDRA-7710 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jimmy Mårdell Priority: Minor DatabaseDescriptor.getNumTokens() is invoked in the nodetool ring command which doesn't work so well when running on a machine where Cassandra doesnt' exist. And it has all kind of side effects as well. This seems fixed in 2.1 but would be nice if it was fixed in 2.0 as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7719) Add PreparedStatements related metrics
Michaël Figuière created CASSANDRA-7719: --- Summary: Add PreparedStatements related metrics Key: CASSANDRA-7719 URL: https://issues.apache.org/jira/browse/CASSANDRA-7719 Project: Cassandra Issue Type: New Feature Reporter: Michaël Figuière Priority: Minor Cassandra newcomers often don't understand that they're expected to use PreparedStatements for almost all of their repetitive queries executed in production. It doesn't look like Cassandra currently expose any PreparedStatements related metrics.It would be interesting, and I believe fairly simple, to add several of them to make it possible, in development / management / monitoring tools, to show warnings or alerts related to this bad practice. Thus I would suggest to add the following metrics: * Executed prepared statements count * Executed unprepared statements count * Amount of PreparedStatements that have been registered on the node -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7622) Implement virtual tables
[ https://issues.apache.org/jira/browse/CASSANDRA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089977#comment-14089977 ] Michaël Figuière commented on CASSANDRA-7622: - +1 as well. Once it there, it would make it easy to expose any API-defined data. This would make it possible to create tools that can rely on JMX metrics, configuration, or any other information without relying on an extra port or on a local agent to access this data. Implement virtual tables Key: CASSANDRA-7622 URL: https://issues.apache.org/jira/browse/CASSANDRA-7622 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Tupshin Harper Fix For: 3.0 There are a variety of reasons to want virtual tables, which would be any table that would be backed by an API, rather than data explicitly managed and stored as sstables. One possible use case would be to expose JMX data through CQL as a resurrection of CASSANDRA-3527. Another is a more general framework to implement the ability to expose yaml configuration information. So it would be an alternate approach to CASSANDRA-7370. A possible implementation would be in terms of CASSANDRA-7443, but I am not presupposing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092532#comment-14092532 ] Piotr Kołaczkowski commented on CASSANDRA-6927: --- Ok, doing the review today. Thanks for the reminder. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927-v4.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092559#comment-14092559 ] Piotr Kołaczkowski commented on CASSANDRA-6927: --- Patch does not apply to trunk: {noformat} $ git apply trunk-6927-v4.txt trunk-6927-v4.txt:148: trailing whitespace. trunk-6927-v4.txt:150: trailing whitespace. trunk-6927-v4.txt:153: trailing whitespace. protected final int bufferSize; trunk-6927-v4.txt:158: trailing whitespace. trunk-6927-v4.txt:326: trailing whitespace. } error: patch failed: src/java/org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java:20 error: src/java/org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java: patch does not apply error: patch failed: src/java/org/apache/cassandra/streaming/StreamManager.java:76 error: src/java/org/apache/cassandra/streaming/StreamManager.java: patch does not apply {noformat} Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927-v4.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6927) Create a CQL3 based bulk OutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092590#comment-14092590 ] Piotr Kołaczkowski commented on CASSANDRA-6927: --- +1 I applied the patch using IDE (it seems to have stronger algorithms for applying patches than the default in git) and all looks good. Please update the patch to make it mergable and move to testing. Create a CQL3 based bulk OutputFormat - Key: CASSANDRA-6927 URL: https://issues.apache.org/jira/browse/CASSANDRA-6927 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: 6927-2.0-branch-v2.txt, trunk-6927-v3.txt, trunk-6927-v4.txt, trunk-6927.txt This is the CQL compatible version of BulkOutputFormat. CqlOutputFormat exists, but doesn't write SSTables directly, similar to ColumnFamilyOutputFormat for thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7710) nodetool ring throws exception if run on machine without Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095343#comment-14095343 ] Jimmy Mårdell commented on CASSANDRA-7710: -- Our particular problem is that we have created our own SeedProvider. When you run nodetool ring on a production machine, our custom SeedProvider gets invoked by the nodetool process (as a side effect of DatabaseDescriptor being invoked) which causes some minor issues. This seems like a bug to me; nodetool ought to be a pure CLI fetching data through JMX. Compare the 2.0 implementation (using DatabaseDescriptor.getNumTokens() to determine if the cluster uses vnodes): https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/tools/NodeCmd.java#L296 and the 2.1 implementations (which determines if the cluster uses vnodes by counting number of tokens through JMX) https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/tools/NodeTool.java#L437 The 2.1 implementation is clearly superior and doesn't have side effects. I think this should be backported 2.0. I might do this myself, since it doesn't seem to so hard. nodetool ring throws exception if run on machine without Cassandra -- Key: CASSANDRA-7710 URL: https://issues.apache.org/jira/browse/CASSANDRA-7710 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jimmy Mårdell Assignee: Michael Shuler Priority: Minor DatabaseDescriptor.getNumTokens() is invoked in the nodetool ring command which doesn't work so well when running on a machine where Cassandra doesnt' exist. And it has all kind of side effects as well. This seems fixed in 2.1 but would be nice if it was fixed in 2.0 as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7710) nodetool ring throws exception if run on machine without Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096824#comment-14096824 ] Jimmy Mårdell commented on CASSANDRA-7710: -- Thanks, yes, that's the one I was looking for! nodetool ring throws exception if run on machine without Cassandra -- Key: CASSANDRA-7710 URL: https://issues.apache.org/jira/browse/CASSANDRA-7710 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jimmy Mårdell Assignee: Michael Shuler Priority: Minor DatabaseDescriptor.getNumTokens() is invoked in the nodetool ring command which doesn't work so well when running on a machine where Cassandra doesnt' exist. And it has all kind of side effects as well. This seems fixed in 2.1 but would be nice if it was fixed in 2.0 as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7783) Snapshot repairs can hang forever
Jimmy Mårdell created CASSANDRA-7783: Summary: Snapshot repairs can hang forever Key: CASSANDRA-7783 URL: https://issues.apache.org/jira/browse/CASSANDRA-7783 Project: Cassandra Issue Type: Bug Reporter: Jimmy Mårdell When the AntiEntropService sends the snapshot repair request, it sets up a callback in an ExpiringMap. If the time it takes for the snapshot exceeds the RPC timeout, the callback will expire from the map and the snapshot responses will be dropped. The repair then gets stuck forever blocking at the snapshotLatch. It's not even possible to kill the repair with forceTerminateAllRepairSessions() This is likely fixed in 2.0 since that part of the code is completely rewritten. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7783) Snapshot repairs can hang forever
[ https://issues.apache.org/jira/browse/CASSANDRA-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099338#comment-14099338 ] Jimmy Mårdell commented on CASSANDRA-7783: -- Ah, okay. And not so easy to backport to the 1.2 branch by the look of things. Snapshot repairs can hang forever - Key: CASSANDRA-7783 URL: https://issues.apache.org/jira/browse/CASSANDRA-7783 Project: Cassandra Issue Type: Bug Reporter: Jimmy Mårdell When the AntiEntropService sends the snapshot repair request, it sets up a callback in an ExpiringMap. If the time it takes for the snapshot exceeds the RPC timeout, the callback will expire from the map and the snapshot responses will be dropped. The repair then gets stuck forever blocking at the snapshotLatch. It's not even possible to kill the repair with forceTerminateAllRepairSessions() This is likely fixed in 2.0 since that part of the code is completely rewritten. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7789) cqlsh COPY Command should display progress
Michaël Figuière created CASSANDRA-7789: --- Summary: cqlsh COPY Command should display progress Key: CASSANDRA-7789 URL: https://issues.apache.org/jira/browse/CASSANDRA-7789 Project: Cassandra Issue Type: Improvement Reporter: Michaël Figuière Priority: Minor While CASSANDRA-7405 is on its way to make the {{COPY}} command much faster, it's still likely to hang for many minutes when transferring a large amount of data. This gives the feeling to the newcomers that something went wrong. Even if the user expect cqlsh to hang for a long moment, it's not very convenient as you have no idea of when the copy will be complete. I believe it would be very pleasant if the {{COPY}} command could display an in-place progress output while it's executed with probably: * Rows copied * avg Rows/s * CSV File R/W MB * CSV File R/W MB/s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7789) cqlsh COPY Command should display progress
[ https://issues.apache.org/jira/browse/CASSANDRA-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michaël Figuière updated CASSANDRA-7789: Description: While CASSANDRA-7405 is on its way to make the {{COPY}} command much faster, it's still likely to hang for many minutes when transferring a large amount of data. This gives the feeling to the newcomers that something went wrong. Even if the user expect cqlsh to hang for a long moment, it's not very convenient as you have no idea of when the copy will be complete. I believe it would be very pleasant if the {{COPY}} command could display an in-place progress output while it's executed with probably: * Rows copied * avg Rows/s * CSV File R/W MB * avg CSV File R/W MB/s was: While CASSANDRA-7405 is on its way to make the {{COPY}} command much faster, it's still likely to hang for many minutes when transferring a large amount of data. This gives the feeling to the newcomers that something went wrong. Even if the user expect cqlsh to hang for a long moment, it's not very convenient as you have no idea of when the copy will be complete. I believe it would be very pleasant if the {{COPY}} command could display an in-place progress output while it's executed with probably: * Rows copied * avg Rows/s * CSV File R/W MB * CSV File R/W MB/s cqlsh COPY Command should display progress -- Key: CASSANDRA-7789 URL: https://issues.apache.org/jira/browse/CASSANDRA-7789 Project: Cassandra Issue Type: Improvement Reporter: Michaël Figuière Priority: Minor While CASSANDRA-7405 is on its way to make the {{COPY}} command much faster, it's still likely to hang for many minutes when transferring a large amount of data. This gives the feeling to the newcomers that something went wrong. Even if the user expect cqlsh to hang for a long moment, it's not very convenient as you have no idea of when the copy will be complete. I believe it would be very pleasant if the {{COPY}} command could display an in-place progress output while it's executed with probably: * Rows copied * avg Rows/s * CSV File R/W MB * avg CSV File R/W MB/s -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6602) Compaction improvements to optimize time series data
[ https://issues.apache.org/jira/browse/CASSANDRA-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Björn Hegerfors updated CASSANDRA-6602: --- Attachment: TimestampViewer.java 8 weeks.txt 1 week.txt Sorry about not getting back with results earlier. Here, I attatch a tool that will print min/max timestamps of each input SSTable, and calculate their overlaps. Along with that are the output that TimestampViewer gave after 1 week and after about 8 weeks, since the write survey mode node was started with DTCS. There were few hiccups with this test, because all timestamps are not in microseconds. It turned out that data in this particular cluster used to be written with microsecond timestamps, but at one point it apparently started using milliseconds instead. For that reason, I had to abort the first test and make a version of DTCS that would convert any timestamp into microseconds (making assumptions about which year it's running, of course). That fixed the biggest problem, but the results are still somewhat affected by it. What has happened, the week 8 output makes this clear, is that the biggest and oldest SSTable at this point contains all of the microsecond timestamps and some of the millisecond timestamps. The minimum timestamp, as that SSTable sees it, is one in milliseconds, but it actually contains much older data that was written in microseconds. The maximum timestamp, as that file sees it, is one in microseconds, but it actually contains more recent data in milliseconds. So that one file simply lies about its time interval. Any newer SSTable (May 24 and on in the 8 week output) is unaffected by this! The week 1 file seems to be from a point where this had not yet stabilized, so it may not be of much value. Regardless, the week 8 output looks very good if you scroll down to the bottom. The huge gap at the beginning is caused by the timestamp inconsistency, so what you should really read out of this is that very nearly 100% of the whole timespan is covered by only a single SSTable; overlaps are negligible. Let me hear what you think about this output. Should it compact more to keep the number of files lower, for instance? That is partly adjustable by compaction options of course, which are all left at default in this case. Does the output give you a good idea of what's going on? Ideally, I'd like to view a diagram of it. Compaction improvements to optimize time series data Key: CASSANDRA-6602 URL: https://issues.apache.org/jira/browse/CASSANDRA-6602 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Tupshin Harper Assignee: Björn Hegerfors Labels: compaction, performance Fix For: 3.0 Attachments: 1 week.txt, 8 weeks.txt, TimestampViewer.java, cassandra-2.0-CASSANDRA-6602-DateTieredCompactionStrategy.txt, cassandra-2.0-CASSANDRA-6602-DateTieredCompactionStrategy_v2.txt, cassandra-2.0-CASSANDRA-6602-DateTieredCompactionStrategy_v3.txt There are some unique characteristics of many/most time series use cases that both provide challenges, as well as provide unique opportunities for optimizations. One of the major challenges is in compaction. The existing compaction strategies will tend to re-compact data on disk at least a few times over the lifespan of each data point, greatly increasing the cpu and IO costs of that write. Compaction exists to 1) ensure that there aren't too many files on disk 2) ensure that data that should be contiguous (part of the same partition) is laid out contiguously 3) deleting data due to ttls or tombstones The special characteristics of time series data allow us to optimize away all three. Time series data 1) tends to be delivered in time order, with relatively constrained exceptions 2) often has a pre-determined and fixed expiration date 3) Never gets deleted prior to TTL 4) Has relatively predictable ingestion rates Note that I filed CASSANDRA-5561 and this ticket potentially replaces or lowers the need for it. In that ticket, jbellis reasonably asks, how that compaction strategy is better than disabling compaction. Taking that to heart, here is a compaction-strategy-less approach that could be extremely efficient for time-series use cases that follow the above pattern. (For context, I'm thinking of an example use case involving lots of streams of time-series data with a 5GB per day ingestion rate, and a 1000 day retention with TTL, resulting in an eventual steady state of 5TB per node) 1) You have an extremely large memtable (preferably off heap, if/when doable) for the table, and that memtable is sized to be able to hold a lengthy window of time. A typical period might be one day
[jira] [Updated] (CASSANDRA-6602) Compaction improvements to optimize time series data
[ https://issues.apache.org/jira/browse/CASSANDRA-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Björn Hegerfors updated CASSANDRA-6602: --- Attachment: STCS 16 hours.txt Compaction improvements to optimize time series data Key: CASSANDRA-6602 URL: https://issues.apache.org/jira/browse/CASSANDRA-6602 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Tupshin Harper Assignee: Björn Hegerfors Labels: compaction, performance Fix For: 3.0 Attachments: 1 week.txt, 8 weeks.txt, STCS 16 hours.txt, TimestampViewer.java, cassandra-2.0-CASSANDRA-6602-DateTieredCompactionStrategy.txt, cassandra-2.0-CASSANDRA-6602-DateTieredCompactionStrategy_v2.txt, cassandra-2.0-CASSANDRA-6602-DateTieredCompactionStrategy_v3.txt There are some unique characteristics of many/most time series use cases that both provide challenges, as well as provide unique opportunities for optimizations. One of the major challenges is in compaction. The existing compaction strategies will tend to re-compact data on disk at least a few times over the lifespan of each data point, greatly increasing the cpu and IO costs of that write. Compaction exists to 1) ensure that there aren't too many files on disk 2) ensure that data that should be contiguous (part of the same partition) is laid out contiguously 3) deleting data due to ttls or tombstones The special characteristics of time series data allow us to optimize away all three. Time series data 1) tends to be delivered in time order, with relatively constrained exceptions 2) often has a pre-determined and fixed expiration date 3) Never gets deleted prior to TTL 4) Has relatively predictable ingestion rates Note that I filed CASSANDRA-5561 and this ticket potentially replaces or lowers the need for it. In that ticket, jbellis reasonably asks, how that compaction strategy is better than disabling compaction. Taking that to heart, here is a compaction-strategy-less approach that could be extremely efficient for time-series use cases that follow the above pattern. (For context, I'm thinking of an example use case involving lots of streams of time-series data with a 5GB per day ingestion rate, and a 1000 day retention with TTL, resulting in an eventual steady state of 5TB per node) 1) You have an extremely large memtable (preferably off heap, if/when doable) for the table, and that memtable is sized to be able to hold a lengthy window of time. A typical period might be one day. At the end of that period, you flush the contents of the memtable to an sstable and move to the next one. This is basically identical to current behaviour, but with thresholds adjusted so that you can ensure flushing at predictable intervals. (Open question is whether predictable intervals is actually necessary, or whether just waiting until the huge memtable is nearly full is sufficient) 2) Combine the behaviour with CASSANDRA-5228 so that sstables will be efficiently dropped once all of the columns have. (Another side note, it might be valuable to have a modified version of CASSANDRA-3974 that doesn't bother storing per-column TTL since it is required that all columns have the same TTL) 3) Be able to mark column families as read/write only (no explicit deletes), so no tombstones. 4) Optionally add back an additional type of delete that would delete all data earlier than a particular timestamp, resulting in immediate dropping of obsoleted sstables. The result is that for in-order delivered data, Every cell will be laid out optimally on disk on the first pass, and over the course of 1000 days and 5TB of data, there will only be 1000 5GB sstables, so the number of filehandles will be reasonable. For exceptions (out-of-order delivery), most cases will be caught by the extended (24 hour+) memtable flush times and merged correctly automatically. For those that were slightly askew at flush time, or were delivered so far out of order that they go in the wrong sstable, there is relatively low overhead to reading from two sstables for a time slice, instead of one, and that overhead would be incurred relatively rarely unless out-of-order delivery was the common case, in which case, this strategy should not be used. Another possible optimization to address out-of-order would be to maintain more than one time-centric memtables in memory at a time (e.g. two 12 hour ones), and then you always insert into whichever one of the two owns the appropriate range of time. By delaying flushing the ahead one until we are ready to roll writes over to a third one, we are able to avoid any fragmentation as long as all deliveries come in no more than 12 hours late
[jira] [Comment Edited] (CASSANDRA-6602) Compaction improvements to optimize time series data
[ https://issues.apache.org/jira/browse/CASSANDRA-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105970#comment-14105970 ] Björn Hegerfors edited comment on CASSANDRA-6602 at 8/21/14 9:26 PM: - Sorry about not getting back with results earlier. Here, I attatch a tool that will print min/max timestamps of each input SSTable, and calculate their overlaps. Along with that are the output that TimestampViewer gave after 1 week and after about 8 weeks, since the write survey mode node was started with DTCS. There were few hiccups with this test, because all timestamps are not in microseconds. It turned out that data in this particular cluster used to be written with microsecond timestamps, but at one point it apparently started using milliseconds instead. For that reason, I had to abort the first test and make a version of DTCS that would convert any timestamp into microseconds (making assumptions about which year it's running, of course). That fixed the biggest problem, but the results are still somewhat affected by it. What has happened, the week 8 output makes this clear, is that the biggest and oldest SSTable at this point contains all of the microsecond timestamps and some of the millisecond timestamps. The minimum timestamp, as that SSTable sees it, is one in milliseconds, but it actually contains much older data that was written in microseconds. The maximum timestamp, as that file sees it, is one in microseconds, but it actually contains more recent data in milliseconds. So that one file simply lies about its time interval. Any newer SSTable (May 24 and on in the 8 week output) is unaffected by this! The week 1 file seems to be from a point where this had not yet stabilized, so it may not be of much value. Regardless, the week 8 output looks very good if you scroll down to the bottom. The huge gap at the beginning is caused by the timestamp inconsistency, so what you should really read out of this is that very nearly 100% of the whole timespan is covered by only a single SSTable; overlaps are negligible. Let me hear what you think about this output. Should it compact more to keep the number of files lower, for instance? That is partly adjustable by compaction options of course, which are all left at default in this case. Does the output give you a good idea of what's going on? Ideally, I'd like to view a diagram of it. EDIT: Oh, I forgot to demonstrate what TimestampViewer shows if you run STCS. The attached file STCS 16 hours.txt just shows a simple non-production test that I ran on my laptop for 16 hours. It was a simple time series with 100 rows. The point is to show how horribly much the min/max timestamps overlap. 30% of the timespan is covered by all 11 SSTables! If someone wants to try TimestampViewer on a production cluster that has run STCS for a time series for a long while, that would be useful to see. The output on LCS nodes doesn't look too useful. Sadly, the cluster that I did my production test on used LCS, when DTCS is much more directly comparable to STCS. was (Author: bj0rn): Sorry about not getting back with results earlier. Here, I attatch a tool that will print min/max timestamps of each input SSTable, and calculate their overlaps. Along with that are the output that TimestampViewer gave after 1 week and after about 8 weeks, since the write survey mode node was started with DTCS. There were few hiccups with this test, because all timestamps are not in microseconds. It turned out that data in this particular cluster used to be written with microsecond timestamps, but at one point it apparently started using milliseconds instead. For that reason, I had to abort the first test and make a version of DTCS that would convert any timestamp into microseconds (making assumptions about which year it's running, of course). That fixed the biggest problem, but the results are still somewhat affected by it. What has happened, the week 8 output makes this clear, is that the biggest and oldest SSTable at this point contains all of the microsecond timestamps and some of the millisecond timestamps. The minimum timestamp, as that SSTable sees it, is one in milliseconds, but it actually contains much older data that was written in microseconds. The maximum timestamp, as that file sees it, is one in microseconds, but it actually contains more recent data in milliseconds. So that one file simply lies about its time interval. Any newer SSTable (May 24 and on in the 8 week output) is unaffected by this! The week 1 file seems to be from a point where this had not yet stabilized, so it may not be of much value. Regardless, the week 8 output looks very good if you scroll down to the bottom. The huge gap at the beginning is caused by the timestamp inconsistency, so what you should really read out of this is that very nearly
[jira] [Created] (CASSANDRA-7831) recreating a counter column after dropping it leaves it unusable state
Peter Mädel created CASSANDRA-7831: -- Summary: recreating a counter column after dropping it leaves it unusable state Key: CASSANDRA-7831 URL: https://issues.apache.org/jira/browse/CASSANDRA-7831 Project: Cassandra Issue Type: Bug Components: Core Reporter: Peter Mädel create table counter_bug (t int, c counter, primary key (t)); update counter_bug set c = c +1 where t = 1; select * from counter_bug ; t | c ---+--- 1 | 1 (1 rows) alter table counter_bug drop c; alter table counter_bug add c counter; update counter_bug set c = c +1 where t = 1; select * from counter_bug; (0 rows) update counter_bug set c = c +1 where t = 2; select * from counter_bug; (0 rows) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7831) recreating a counter column after dropping it leaves in unusable state
[ https://issues.apache.org/jira/browse/CASSANDRA-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Mädel updated CASSANDRA-7831: --- Summary: recreating a counter column after dropping it leaves in unusable state (was: recreating a counter column after dropping it leaves it unusable state) recreating a counter column after dropping it leaves in unusable state -- Key: CASSANDRA-7831 URL: https://issues.apache.org/jira/browse/CASSANDRA-7831 Project: Cassandra Issue Type: Bug Components: Core Reporter: Peter Mädel create table counter_bug (t int, c counter, primary key (t)); update counter_bug set c = c +1 where t = 1; select * from counter_bug ; t | c ---+--- 1 | 1 (1 rows) alter table counter_bug drop c; alter table counter_bug add c counter; update counter_bug set c = c +1 where t = 1; select * from counter_bug; (0 rows) update counter_bug set c = c +1 where t = 2; select * from counter_bug; (0 rows) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7375) nodetool units wrong for streamthroughput
[ https://issues.apache.org/jira/browse/CASSANDRA-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olve Sæther Hansen updated CASSANDRA-7375: -- Attachment: cassandra-2.0.txt nodetool units wrong for streamthroughput - Key: CASSANDRA-7375 URL: https://issues.apache.org/jira/browse/CASSANDRA-7375 Project: Cassandra Issue Type: Bug Components: Core Reporter: Mike Heffner Priority: Minor Labels: lhf Attachments: cassandra-2.0.txt Stream throughput is measured in megabits (Mbps) in cassandray.yaml: {code} # When unset, the default is 200 Mbps or 25 MB/s. # stream_throughput_outbound_megabits_per_sec: 200 {code} However, the nodetool command uses the unit MB/s which implies megabytes/sec: getstreamthroughput- Print the MB/s throughput cap for streaming in the system setstreamthroughput value_in_mb - Set the MB/s throughput cap for streaming in the system, or 0 to disable throttling. $ nodetool getstreamthroughput Current stream throughput: 200 MB/s Fix references in nodetool to use Mbps -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-4762) Support IN clause for any clustering column
[ https://issues.apache.org/jira/browse/CASSANDRA-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119511#comment-14119511 ] Peter Mädel commented on CASSANDRA-4762: any chances to get this fix into 2.1.x ? Support IN clause for any clustering column --- Key: CASSANDRA-4762 URL: https://issues.apache.org/jira/browse/CASSANDRA-4762 Project: Cassandra Issue Type: Improvement Components: Core Reporter: T Jake Luciani Assignee: Benjamin Lerer Labels: cql, docs Fix For: 3.0 Attachments: 4762-1.txt Given CASSANDRA-3885 It seems it should be possible to store multiple ranges for many predicates even the inner parts of a composite column. They could be expressed as a expanded set of filter queries. example: {code} CREATE TABLE test ( name text, tdate timestamp, tdate2 timestamp, tdate3 timestamp, num double, PRIMARY KEY(name,tdate,tdate2,tdate3) ) WITH COMPACT STORAGE; SELECT * FROM test WHERE name IN ('a','b') and tdate IN ('2010-01-01','2011-01-01') and tdate2 IN ('2010-01-01','2011-01-01') and tdate3 IN ('2010-01-01','2011-01-01') {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-7880) Create a new system table schema_change_history
Michaël Figuière created CASSANDRA-7880: --- Summary: Create a new system table schema_change_history Key: CASSANDRA-7880 URL: https://issues.apache.org/jira/browse/CASSANDRA-7880 Project: Cassandra Issue Type: New Feature Reporter: Michaël Figuière Priority: Minor The current way Cassandra handle schema modification can lead to some schema disagreements as DDL statements execution doesn't come with any absolute guarantee. I understand that entirely seamless schema updates in such a distributed system will be challenging to reach and probably not a high priority for now. That being said these disagreements can sometime lead to challenging situation for scripts or tools that need things to be in order to move on. To clarify the situation, help the user to figure out what's going on, as well as to properly log these sensitive operations, it would be interesting to add a {{schema_change_history}} table in the {{system}} keyspace. I would expect it to be local to a node and to contain the following information: * DDL statement that has been executed * User login used for the operation * IP of the client that originated the request * Date/Time of the change * Schema version before the change * Schema version after the change Under normal conditions, Cassandra shouldn't handle a massive amount of DDL statements so this table should grow at a descent pace. Nevertheless to bound its growth we can consider adding a TTL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-7881) SCHEMA_CHANGE Events and Responses should carry the Schema Version
Michaël Figuière created CASSANDRA-7881: --- Summary: SCHEMA_CHANGE Events and Responses should carry the Schema Version Key: CASSANDRA-7881 URL: https://issues.apache.org/jira/browse/CASSANDRA-7881 Project: Cassandra Issue Type: New Feature Reporter: Michaël Figuière Priority: Minor For similar logging and debugging purpose as exposed in CASSANDRA-7880, it would be helpful to send to the client the previous and new schema version UUID that were in use before and after a schema change operation, in the {{SCHEMA_CHANGE}} events and responses in the protocol v4. This could then be exposed in the client APIs in order to bring much more precise awareness of the actual status of the schema on each node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8254) Query parameters (and more) are limited to 65,536 entries
[ https://issues.apache.org/jira/browse/CASSANDRA-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196856#comment-14196856 ] Michaël Figuière commented on CASSANDRA-8254: - It feels to me that to cope with such situation it would be more elegant to allow for a single parameter that would be something like a {{ListTuple?}}. Going beyond 65k parameters feels odd to me, even in this relevant situation. Query parameters (and more) are limited to 65,536 entries - Key: CASSANDRA-8254 URL: https://issues.apache.org/jira/browse/CASSANDRA-8254 Project: Cassandra Issue Type: Bug Components: API Reporter: Nicolas Favre-Felix Parameterized queries are sent over the wire as a string followed by a list of arguments. This list is decoded in QueryOptions.Codec by CBUtil.readValueList(body), which in turn reads a 16-bit short value from the wire as the number of values to deserialize. Sending more values leads to a silent overflow, sometimes reported by the driver as a protocol error as other values are deserialized incorrectly. 64k sounds like a lot, but tables with a large number of clustering dimensions can hit this limit when fetching a few thousand CQL rows only with an IN query, e.g. {code} SELECT * FROM sensor_data WHERE a=? and (b,c,d,e,f,g,h,i) IN ((?,?,?,?,?,?,?,?), (?,?,?,?,?,?,?,?), (?,?,?,?,?,?,?,?), (?,?,?,?,?,?,?,?) ... ) {code} Here, having 8 dimensions in the clustering key plus 1 in the partitioning key restricts the read to 8,191 CQL rows. Some other parts of Cassandra still use 16-bit sizes, for example preventing users to fetch all elements of a large collection (CASSANDRA-6428). The suggestion at the time was we'll fix it in the next iteration of the binary protocol, so I'd like to suggest switching to variable-length integers as this would solve such issues while keeping messages short. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8193) Multi-DC parallel snapshot repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Mårdell updated CASSANDRA-8193: - Attachment: cassandra-2.0-8193-1.txt Multi-DC parallel snapshot repair - Key: CASSANDRA-8193 URL: https://issues.apache.org/jira/browse/CASSANDRA-8193 Project: Cassandra Issue Type: Improvement Reporter: Jimmy Mårdell Assignee: Jimmy Mårdell Priority: Minor Attachments: cassandra-2.0-8193-1.txt The current behaviour of snapshot repair is to let one node at a time calculate a merkle tree. This is to ensure only one node at a time is doing the expensive calculation. The drawback is that it takes even longer time to do the merkle tree calculation. In a multi-DC setup, I think it would make more sense to have one node in each DC calculate the merkle tree at the same time. This would yield a significant improvement when you have many data centers. I'm not sure how relevant this is in 2.1, but I don't see us upgrading to 2.1 any time soon. Unless there is an obvious drawback that I'm missing, I'd like to implement this in the 2.0 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8228) Log malfunctioning host on prepareForRepair
[ https://issues.apache.org/jira/browse/CASSANDRA-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207726#comment-14207726 ] Juho Mäkinen commented on CASSANDRA-8228: - The wait time was definitively not one hour but just something like a minute or less. I stopped using 2.1.1 and I downgraded to 2.0.11 because the repairs on 2.1.1 were so unstable on my setup due to this error :( Log malfunctioning host on prepareForRepair --- Key: CASSANDRA-8228 URL: https://issues.apache.org/jira/browse/CASSANDRA-8228 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Juho Mäkinen Priority: Trivial Labels: lhf Repair startup goes thru ActiveRepairService.prepareForRepair() which might result with Repair failed with error Did not get positive replies from all endpoints. error, but there's no other logging regarding to this error. It seems that it would be trivial to modify the prepareForRepair() to log the host address which caused the error, thus ease the debugging effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8012) cqlsh DESCRIBE KEYSPACES; returns empty after upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207860#comment-14207860 ] Rafał Furmański commented on CASSANDRA-8012: I have exact same issue in cassandra v. 2.1.1 {code} root@db1:~# cqlsh 10.210.3.221 --debug Using CQL driver: module 'cassandra' from '/usr/share/cassandra/lib/cassandra-driver-internal-only-2.1.1.post.zip/cassandra-driver-2.1.1.post/cassandra/__init__.py' Connected to Production Cluster at 10.210.3.221:9042. [cqlsh 5.0.1 | Cassandra 2.1.1 | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cqlsh DESCRIBE keyspaces; empty cqlsh DESCRIBE keyspace sync; Traceback (most recent call last): File /usr/bin/cqlsh, line 861, in onecmd self.handle_statement(st, statementtext) File /usr/bin/cqlsh, line 899, in handle_statement return custom_handler(parsed) File /usr/bin/cqlsh, line 1265, in do_describe self.describe_keyspace(ksname) File /usr/bin/cqlsh, line 1137, in describe_keyspace self.print_recreate_keyspace(self.get_keyspace_meta(ksname), sys.stdout) File /usr/bin/cqlsh, line 699, in get_keyspace_meta raise KeyspaceNotFound('Keyspace %r not found.' % ksname) KeyspaceNotFound: Keyspace 'sync' not found. cqlsh use sync; cqlsh:sync {code} Any ideas how to fix this? Attaching cluster info: {code} root@db1:~# nodetool describecluster Cluster Information: Name: Production Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: f8040381-4d5b-38da-9eda-5fd96474694a: [10.210.3.160, 10.210.3.224, 10.210.3.221, 10.195.15.163, 10.195.15.162] {code} cqlsh DESCRIBE KEYSPACES; returns empty after upgrade --- Key: CASSANDRA-8012 URL: https://issues.apache.org/jira/browse/CASSANDRA-8012 Project: Cassandra Issue Type: Bug Components: Tools Environment: cassandra 2.1.0 Reporter: Shawn Zhou Assignee: Tyler Hobbs Priority: Minor Labels: cqlsh Fix For: 2.1.3 after upgrade from 2.0.7 to 2.1.0 run cqlsh DESCRIBE KEYSPACES; returns empty result; query individual table does return data; See below: {noformat} [root@dc1-stg-cassandra-08 cassandra]# cqlsh dc1-stg-cassandra-08.dc01.revsci.net -k as_user_segment Connected to Stage Cluster at dc1-stg-cassandra-08.dc01.revsci.net:9042. [cqlsh 5.0.1 | Cassandra 2.1.0 | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cqlsh:as_user_segment DESCRIBE KEYSPACES; empty cqlsh:as_user_segment select * from user_segments_blob where id = '8e6090087fc1a7591a99dc4cb744ac43'; id | segments --+ 8e6090087fc1a7591a99dc4cb744ac43 | 0x9416b015911d3b0e211aee227216ab1cdf0f4204 (1 rows) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8296) Can't add new node to a Cluster due to invalid gossip generation
Rafał Furmański created CASSANDRA-8296: -- Summary: Can't add new node to a Cluster due to invalid gossip generation Key: CASSANDRA-8296 URL: https://issues.apache.org/jira/browse/CASSANDRA-8296 Project: Cassandra Issue Type: Bug Environment: Debian Wheezy Cassandra 2.1.1 Reporter: Rafał Furmański Hi all! I'm unable to add new node to an existing Cassandra cluster. I'm using GossipingPropertyFileSnitch and after starting cassandra I get following errors on other nodes: {code} WARN [GossipStage:2] 2014-11-12 09:38:43,297 Gossiper.java:993 - received an invalid gossip generation for peer /10.210.3.230; local generation = 3, received generation = 1415785003 {code} New node's IP is 10.210.3.230. I can't see this IP in {code}nodetool status{code} Any ideas? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8193) Multi-DC parallel snapshot repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208773#comment-14208773 ] Jimmy Mårdell commented on CASSANDRA-8193: -- I think it's more of a performance improvement rather than a new feature. I could do the fallback, but why is it necessary? Is it a common use case to have RF=1 in a multi-DC setup and do for instance quorum queries across datacenters? It will be a bit more messy. The reason ParallelRequestCoordinator is more generic an implements IRequestCoordinatorR is because that's how the old RequestCoordinator was written. I did't really see why it was generic in the first place, but I kept it. I could remove the generics entirely and use InetAddress always (there are no other usages of it). Ah right, the call to completed will always be synchronized from addTree. Missed that, thanks. Multi-DC parallel snapshot repair - Key: CASSANDRA-8193 URL: https://issues.apache.org/jira/browse/CASSANDRA-8193 Project: Cassandra Issue Type: Improvement Reporter: Jimmy Mårdell Assignee: Jimmy Mårdell Priority: Minor Fix For: 2.0.12 Attachments: cassandra-2.0-8193-1.txt The current behaviour of snapshot repair is to let one node at a time calculate a merkle tree. This is to ensure only one node at a time is doing the expensive calculation. The drawback is that it takes even longer time to do the merkle tree calculation. In a multi-DC setup, I think it would make more sense to have one node in each DC calculate the merkle tree at the same time. This would yield a significant improvement when you have many data centers. I'm not sure how relevant this is in 2.1, but I don't see us upgrading to 2.1 any time soon. Unless there is an obvious drawback that I'm missing, I'd like to implement this in the 2.0 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8296) Can't add new node to a Cluster due to invalid gossip generation
[ https://issues.apache.org/jira/browse/CASSANDRA-8296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209529#comment-14209529 ] Rafał Furmański commented on CASSANDRA-8296: Actually I repaired this by purging gossip state as described here: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/opsGossipPurge_t.html I got the same value (3) on all other nodes in the cluster. I had some troubles with this node from the beginning. I wish I could say more, but my problem is resolved now.. Can't add new node to a Cluster due to invalid gossip generation Key: CASSANDRA-8296 URL: https://issues.apache.org/jira/browse/CASSANDRA-8296 Project: Cassandra Issue Type: Bug Environment: Debian Wheezy Cassandra 2.1.1 Reporter: Rafał Furmański Assignee: Jason Brown Hi all! I'm unable to add new node to an existing Cassandra cluster. I'm using GossipingPropertyFileSnitch and after starting cassandra I get following errors on other nodes: {code} WARN [GossipStage:2] 2014-11-12 09:38:43,297 Gossiper.java:993 - received an invalid gossip generation for peer /10.210.3.230; local generation = 3, received generation = 1415785003 {code} New node's IP is 10.210.3.230. I can't see this IP in {code}nodetool status{code} Any ideas? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8228) Log malfunctioning host on prepareForRepair
[ https://issues.apache.org/jira/browse/CASSANDRA-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209663#comment-14209663 ] Juho Mäkinen commented on CASSANDRA-8228: - Yeah, maybe list all those machine addresses which failed to respond. I'm not sure what other info could be logged. I'm sure I had some issues which caused this error, but because of the lack of good log messages I couldn't debug this further. If you have some idea what else could be printed to aid debugging issues related to this, please add :) Log malfunctioning host on prepareForRepair --- Key: CASSANDRA-8228 URL: https://issues.apache.org/jira/browse/CASSANDRA-8228 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Juho Mäkinen Priority: Trivial Labels: lhf Repair startup goes thru ActiveRepairService.prepareForRepair() which might result with Repair failed with error Did not get positive replies from all endpoints. error, but there's no other logging regarding to this error. It seems that it would be trivial to modify the prepareForRepair() to log the host address which caused the error, thus ease the debugging effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8306) exception in nodetool enablebinary
Rafał Furmański created CASSANDRA-8306: -- Summary: exception in nodetool enablebinary Key: CASSANDRA-8306 URL: https://issues.apache.org/jira/browse/CASSANDRA-8306 Project: Cassandra Issue Type: Bug Reporter: Rafał Furmański I was trying to add new node (db4) to existing cluster - with no luck. I can't see any errors in system.log. nodetool status shows, that node is joining into cluster (for many hours). Attaching error and cluster info: {code} root@db4:~# nodetool enablebinary error: Error starting native transport: null -- StackTrace -- java.lang.RuntimeException: Error starting native transport: null at org.apache.cassandra.service.StorageService.startNativeTransport(StorageService.java:350) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} {code} root@db4:~# nodetool describecluster Cluster Information: Name: Production Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: b7e98bb9-717f-3f59-bac4-84bc19544e90: [10.195.15.163, 10.195.15.162, 10.195.15.167, 10.195.15.166] {code} {code} root@db4:~# nodetool status Datacenter: Ashburn === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 10.195.15.163 12.05 GB 256 ? 0a9f478c-80b5-4c15-8b2e-e27df6684c69 RAC1 UN 10.195.15.162 12.8 GB256 ? c18d2218-ef84-4165-9c3a-05f592f512e9 RAC1 UJ 10.195.15.167 18.61 GB 256 ? 0d3999d9-1e33-4407-bbbd-10cf0a93b3ba RAC1 UN 10.195.15.166 13.67 GB 256 ? df8df3b7-da17-48de-8cf6-1c718dc2fde8 RAC1 {code} I can't even connect to cassandra using cqlsh: {code} root@db4:~# cqlsh 10.195.15.167 Connection error: ('Unable to connect to any servers', {'10.195.15.167': error
[jira] [Updated] (CASSANDRA-8243) DTCS can leave time-overlaps, limiting ability to expire entire SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Björn Hegerfors updated CASSANDRA-8243: --- Attachment: cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt DTCS can leave time-overlaps, limiting ability to expire entire SSTables Key: CASSANDRA-8243 URL: https://issues.apache.org/jira/browse/CASSANDRA-8243 Project: Cassandra Issue Type: Bug Reporter: Björn Hegerfors Assignee: Björn Hegerfors Priority: Minor Labels: compaction, performance Fix For: 2.0.12, 2.1.3 Attachments: cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt, cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt CASSANDRA-6602 (DTCS) and CASSANDRA-5228 are supposed to be a perfect match for tables where every value is written with a TTL. DTCS makes sure to keep old data separate from new data. So shortly after the TTL has passed, Cassandra should be able to throw away the whole SSTable containing a given data point. CASSANDRA-5228 deletes the very oldest SSTables, and only if they don't overlap (in terms of timestamps) with another SSTable which cannot be deleted. DTCS however, can't guarantee that SSTables won't overlap (again, in terms of timestamps). In a test that I ran, every single SSTable overlapped with its nearest neighbors by a very tiny amount. My reasoning for why this could happen is that the dumped memtables were already overlapping from the start. DTCS will never create an overlap where there is none. I surmised that this happened in my case because I sent parallel writes which must have come out of order. This was just locally, and out of order writes should be much more common non-locally. That means that the SSTable removal optimization may never get a chance to kick in! I can see two solutions: 1. Make DTCS split SSTables on time window borders. This will essentially only be done on a newly dumped memtable once every base_time_seconds. 2. Make TTL SSTable expiry more aggressive. Relax the conditions on which an SSTable can be dropped completely, of course without affecting any semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8243) DTCS can leave time-overlaps, limiting ability to expire entire SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210057#comment-14210057 ] Björn Hegerfors commented on CASSANDRA-8243: An expired column is equivalent to a tombstone with the same timestamp in Cassandra's eyes, right? Compactions even turn them into tombstones, if they can't be immediately purged. So to simplify, we're dealing with all-tombstone SSTables. Both the old and new implementation agree that removing an SSTable can only happen if the oldest SSTable (the one with lowest minTimestamp) is all-tombstones (= has fully expired). Both implementations also agree that this oldest SSTable may not overlap (in time span) with an SSTable containing any non-tombtone data. If there is no such overlap, everything in any SSTable (with an overlapping row range, anyway) written with a timestamp less than or equal to this oldest table's maxTimestamp is guaranteed to be a tombstone. Also, since any SSTable that either of the implementations remove is an all-tombstone SSTable, the only thing that can happen is that something is resurrected. Combined with the reasoning in my previous paragraph, the only thing that could be resurrected when a tombstone for column x with timestamp t is removed is another tombstone for column x, with a lower timestamp t'! When could that matter? Only if some other SSTable makes a constructive write to column x in the interval (t', t]. But that's impossible, because that would then be an SSTable containing some non-tombstone data with a minTimestamp less than or equal to the oldest SSTable's maxTimestamp, which goes against the assumption that no such SSTable exists! There you have a proof by contradiction that the oldest SSTable can be safely removed if it is all-tombstones and doesn't overlap with any SSTable containing any non-tombstone data. If we then consider the oldest SSTable free to remove, the same rules apply to the oldest remaining SSTable and so on. This is the rule that my implementation uses. From the comments it looks like we already agree intuitively on this, but I though a more formal proof like this might help this get committed. [~slebresne] any reason to still not submit this patch to 2.0? Oh, and I noticed that I didn't update the Javadoc, so here comes a new patch. DTCS can leave time-overlaps, limiting ability to expire entire SSTables Key: CASSANDRA-8243 URL: https://issues.apache.org/jira/browse/CASSANDRA-8243 Project: Cassandra Issue Type: Bug Reporter: Björn Hegerfors Assignee: Björn Hegerfors Priority: Minor Labels: compaction, performance Fix For: 2.0.12, 2.1.3 Attachments: cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt, cassandra-trunk-CASSANDRA-8243-aggressiveTTLExpiry.txt CASSANDRA-6602 (DTCS) and CASSANDRA-5228 are supposed to be a perfect match for tables where every value is written with a TTL. DTCS makes sure to keep old data separate from new data. So shortly after the TTL has passed, Cassandra should be able to throw away the whole SSTable containing a given data point. CASSANDRA-5228 deletes the very oldest SSTables, and only if they don't overlap (in terms of timestamps) with another SSTable which cannot be deleted. DTCS however, can't guarantee that SSTables won't overlap (again, in terms of timestamps). In a test that I ran, every single SSTable overlapped with its nearest neighbors by a very tiny amount. My reasoning for why this could happen is that the dumped memtables were already overlapping from the start. DTCS will never create an overlap where there is none. I surmised that this happened in my case because I sent parallel writes which must have come out of order. This was just locally, and out of order writes should be much more common non-locally. That means that the SSTable removal optimization may never get a chance to kick in! I can see two solutions: 1. Make DTCS split SSTables on time window borders. This will essentially only be done on a newly dumped memtable once every base_time_seconds. 2. Make TTL SSTable expiry more aggressive. Relax the conditions on which an SSTable can be dropped completely, of course without affecting any semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8312) Use live sstables in snapshot repair if possible
Jimmy Mårdell created CASSANDRA-8312: Summary: Use live sstables in snapshot repair if possible Key: CASSANDRA-8312 URL: https://issues.apache.org/jira/browse/CASSANDRA-8312 Project: Cassandra Issue Type: Improvement Reporter: Jimmy Mårdell Priority: Minor Snapshot repair can be very much slower than parallel repairs because of the overhead of opening the SSTables in the snapshot. This is particular true when using LCS, as you typically have many smaller SSTables then. I compared parallel and sequential repair on a small range on one of our clusters (2*3 replicas). With parallel repair, this took 22 seconds. With sequential repair (default in 2.0), the same range took 330 seconds! This is an overhead of 330-22*6 = 198 seconds, just opening SSTables (there were 1000+ sstables). Also, opening 1000 sstables for many smaller rangers surely causes lots of memory churning. The idea would be to list the sstables in the snapshot, but use the corresponding sstables in the live set if it's still available. For almost all sstables, the original one should still exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8312) Use live sstables in snapshot repair if possible
[ https://issues.apache.org/jira/browse/CASSANDRA-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Mårdell updated CASSANDRA-8312: - Since Version: 2.0.11 Use live sstables in snapshot repair if possible Key: CASSANDRA-8312 URL: https://issues.apache.org/jira/browse/CASSANDRA-8312 Project: Cassandra Issue Type: Improvement Reporter: Jimmy Mårdell Priority: Minor Snapshot repair can be very much slower than parallel repairs because of the overhead of opening the SSTables in the snapshot. This is particular true when using LCS, as you typically have many smaller SSTables then. I compared parallel and sequential repair on a small range on one of our clusters (2*3 replicas). With parallel repair, this took 22 seconds. With sequential repair (default in 2.0), the same range took 330 seconds! This is an overhead of 330-22*6 = 198 seconds, just opening SSTables (there were 1000+ sstables). Also, opening 1000 sstables for many smaller rangers surely causes lots of memory churning. The idea would be to list the sstables in the snapshot, but use the corresponding sstables in the live set if it's still available. For almost all sstables, the original one should still exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8315) cassandra-env.sh doesn't handle correctly non numeric JDK versions
Michaël Figuière created CASSANDRA-8315: --- Summary: cassandra-env.sh doesn't handle correctly non numeric JDK versions Key: CASSANDRA-8315 URL: https://issues.apache.org/jira/browse/CASSANDRA-8315 Project: Cassandra Issue Type: Bug Reporter: Michaël Figuière Priority: Trivial Trying to work around some JDK bug, I've installed a Early Access release of the JDK, which lead to a small, non-blocking error, in {{cassandra-env.sh}} as it expect the patch part of the JDK version to be a number, but on Oracle EA JDKs, the patch number is followed by an {{-ea}} qualifier as in: {code} $ java -version java version 1.7.0_80-ea Java(TM) SE Runtime Environment (build 1.7.0_80-ea-b02) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b07, mixed mode) {code} This lead to the following error: {code} bin/../conf/cassandra-env.sh: line 102: [: 80-ea: integer expression expected {code} Obviously not a big deal, but we may want to cover this corner case properly by just ignoring the qualifier part of the version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8315) cassandra-env.sh doesn't handle correctly non numeric JDK versions
[ https://issues.apache.org/jira/browse/CASSANDRA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michaël Figuière updated CASSANDRA-8315: Description: Trying to work around a JDK bug, I've installed a Early Access release of the JDK, which lead to a small, non-blocking error, in {{cassandra-env.sh}} as it expects the patch part of the JDK version to be a number, but on Oracle EA JDKs, the patch number is followed by an {{-ea}} qualifier as in: {code} $ java -version java version 1.7.0_80-ea Java(TM) SE Runtime Environment (build 1.7.0_80-ea-b02) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b07, mixed mode) {code} This lead to the following error: {code} bin/../conf/cassandra-env.sh: line 102: [: 80-ea: integer expression expected {code} Obviously not a big deal, but we may want to cover this corner case properly by just ignoring the qualifier part of the version. was: Trying to work around some JDK bug, I've installed a Early Access release of the JDK, which lead to a small, non-blocking error, in {{cassandra-env.sh}} as it expect the patch part of the JDK version to be a number, but on Oracle EA JDKs, the patch number is followed by an {{-ea}} qualifier as in: {code} $ java -version java version 1.7.0_80-ea Java(TM) SE Runtime Environment (build 1.7.0_80-ea-b02) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b07, mixed mode) {code} This lead to the following error: {code} bin/../conf/cassandra-env.sh: line 102: [: 80-ea: integer expression expected {code} Obviously not a big deal, but we may want to cover this corner case properly by just ignoring the qualifier part of the version. cassandra-env.sh doesn't handle correctly non numeric JDK versions -- Key: CASSANDRA-8315 URL: https://issues.apache.org/jira/browse/CASSANDRA-8315 Project: Cassandra Issue Type: Bug Reporter: Michaël Figuière Priority: Trivial Trying to work around a JDK bug, I've installed a Early Access release of the JDK, which lead to a small, non-blocking error, in {{cassandra-env.sh}} as it expects the patch part of the JDK version to be a number, but on Oracle EA JDKs, the patch number is followed by an {{-ea}} qualifier as in: {code} $ java -version java version 1.7.0_80-ea Java(TM) SE Runtime Environment (build 1.7.0_80-ea-b02) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b07, mixed mode) {code} This lead to the following error: {code} bin/../conf/cassandra-env.sh: line 102: [: 80-ea: integer expression expected {code} Obviously not a big deal, but we may want to cover this corner case properly by just ignoring the qualifier part of the version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8312) Use live sstables in snapshot repair if possible
[ https://issues.apache.org/jira/browse/CASSANDRA-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211743#comment-14211743 ] Jimmy Mårdell commented on CASSANDRA-8312: -- Ohh, that's very nice and even better. Would work very well with small ranges and LCS. But since I don't think that can be backported to 2.0 as it adds a new command, I'll check if I can implement this ticket in some simple way anyway. Use live sstables in snapshot repair if possible Key: CASSANDRA-8312 URL: https://issues.apache.org/jira/browse/CASSANDRA-8312 Project: Cassandra Issue Type: Improvement Reporter: Jimmy Mårdell Priority: Minor Snapshot repair can be very much slower than parallel repairs because of the overhead of opening the SSTables in the snapshot. This is particular true when using LCS, as you typically have many smaller SSTables then. I compared parallel and sequential repair on a small range on one of our clusters (2*3 replicas). With parallel repair, this took 22 seconds. With sequential repair (default in 2.0), the same range took 330 seconds! This is an overhead of 330-22*6 = 198 seconds, just opening SSTables (there were 1000+ sstables). Also, opening 1000 sstables for many smaller rangers surely causes lots of memory churning. The idea would be to list the sstables in the snapshot, but use the corresponding sstables in the live set if it's still available. For almost all sstables, the original one should still exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8228) Log malfunctioning host on prepareForRepair
[ https://issues.apache.org/jira/browse/CASSANDRA-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211938#comment-14211938 ] Juho Mäkinen commented on CASSANDRA-8228: - At least the attached patch looks good to a visual inspection. Log malfunctioning host on prepareForRepair --- Key: CASSANDRA-8228 URL: https://issues.apache.org/jira/browse/CASSANDRA-8228 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Juho Mäkinen Assignee: Rajanarayanan Thottuvaikkatumana Priority: Trivial Labels: lhf Attachments: cassandra-trunk-8228.txt Repair startup goes thru ActiveRepairService.prepareForRepair() which might result with Repair failed with error Did not get positive replies from all endpoints. error, but there's no other logging regarding to this error. It seems that it would be trivial to modify the prepareForRepair() to log the host address which caused the error, thus ease the debugging effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8306) exception in nodetool enablebinary
[ https://issues.apache.org/jira/browse/CASSANDRA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211983#comment-14211983 ] Rafał Furmański commented on CASSANDRA-8306: [~mshuler] No, I don't have any errors related to Gossip in system.log. I successfully added one node to the cluster after resolving CASSANDRA-8292, so I'm pretty sure that this is not related. exception in nodetool enablebinary -- Key: CASSANDRA-8306 URL: https://issues.apache.org/jira/browse/CASSANDRA-8306 Project: Cassandra Issue Type: Bug Reporter: Rafał Furmański I was trying to add new node (db4) to existing cluster - with no luck. I can't see any errors in system.log. nodetool status shows, that node is joining into cluster (for many hours). Attaching error and cluster info: {code} root@db4:~# nodetool enablebinary error: Error starting native transport: null -- StackTrace -- java.lang.RuntimeException: Error starting native transport: null at org.apache.cassandra.service.StorageService.startNativeTransport(StorageService.java:350) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} {code} root@db4:~# nodetool describecluster Cluster Information: Name: Production Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: b7e98bb9-717f-3f59-bac4-84bc19544e90: [10.195.15.163, 10.195.15.162, 10.195.15.167, 10.195.15.166] {code} {code} root@db4:~# nodetool status Datacenter: Ashburn === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 10.195.15.163 12.05 GB 256 ? 0a9f478c-80b5-4c15-8b2e-e27df6684c69 RAC1 UN 10.195.15.162 12.8 GB256
[jira] [Comment Edited] (CASSANDRA-8306) exception in nodetool enablebinary
[ https://issues.apache.org/jira/browse/CASSANDRA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211983#comment-14211983 ] Rafał Furmański edited comment on CASSANDRA-8306 at 11/14/14 7:59 AM: -- [~mshuler] No, I don't have any errors related to Gossip in system.log. I successfully added one node to the cluster after resolving CASSANDRA-8296, so I'm pretty sure that this is not related. was (Author: rfurmanski): [~mshuler] No, I don't have any errors related to Gossip in system.log. I successfully added one node to the cluster after resolving CASSANDRA-8292, so I'm pretty sure that this is not related. exception in nodetool enablebinary -- Key: CASSANDRA-8306 URL: https://issues.apache.org/jira/browse/CASSANDRA-8306 Project: Cassandra Issue Type: Bug Reporter: Rafał Furmański I was trying to add new node (db4) to existing cluster - with no luck. I can't see any errors in system.log. nodetool status shows, that node is joining into cluster (for many hours). Attaching error and cluster info: {code} root@db4:~# nodetool enablebinary error: Error starting native transport: null -- StackTrace -- java.lang.RuntimeException: Error starting native transport: null at org.apache.cassandra.service.StorageService.startNativeTransport(StorageService.java:350) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} {code} root@db4:~# nodetool describecluster Cluster Information: Name: Production Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: b7e98bb9-717f-3f59-bac4-84bc19544e90: [10.195.15.163, 10.195.15.162, 10.195.15.167, 10.195.15.166] {code} {code} root@db4:~# nodetool status Datacenter: Ashburn === Status=Up
[jira] [Updated] (CASSANDRA-8306) exception in nodetool enablebinary
[ https://issues.apache.org/jira/browse/CASSANDRA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafał Furmański updated CASSANDRA-8306: --- Attachment: system.log.zip exception in nodetool enablebinary -- Key: CASSANDRA-8306 URL: https://issues.apache.org/jira/browse/CASSANDRA-8306 Project: Cassandra Issue Type: Bug Reporter: Rafał Furmański Attachments: system.log.zip I was trying to add new node (db4) to existing cluster - with no luck. I can't see any errors in system.log. nodetool status shows, that node is joining into cluster (for many hours). Attaching error and cluster info: {code} root@db4:~# nodetool enablebinary error: Error starting native transport: null -- StackTrace -- java.lang.RuntimeException: Error starting native transport: null at org.apache.cassandra.service.StorageService.startNativeTransport(StorageService.java:350) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} {code} root@db4:~# nodetool describecluster Cluster Information: Name: Production Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: b7e98bb9-717f-3f59-bac4-84bc19544e90: [10.195.15.163, 10.195.15.162, 10.195.15.167, 10.195.15.166] {code} {code} root@db4:~# nodetool status Datacenter: Ashburn === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 10.195.15.163 12.05 GB 256 ? 0a9f478c-80b5-4c15-8b2e-e27df6684c69 RAC1 UN 10.195.15.162 12.8 GB256 ? c18d2218-ef84-4165-9c3a-05f592f512e9 RAC1 UJ 10.195.15.167 18.61 GB 256 ? 0d3999d9-1e33-4407-bbbd-10cf0a93b3ba RAC1 UN 10.195.15.166 13.67 GB 256
[jira] [Commented] (CASSANDRA-8306) exception in nodetool enablebinary
[ https://issues.apache.org/jira/browse/CASSANDRA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212036#comment-14212036 ] Rafał Furmański commented on CASSANDRA-8306: Attached system.log. Node has still status Joining. Any ideas why binary transport protocol has not started and I can't even cqlsh on this node? exception in nodetool enablebinary -- Key: CASSANDRA-8306 URL: https://issues.apache.org/jira/browse/CASSANDRA-8306 Project: Cassandra Issue Type: Bug Reporter: Rafał Furmański Attachments: system.log.zip I was trying to add new node (db4) to existing cluster - with no luck. I can't see any errors in system.log. nodetool status shows, that node is joining into cluster (for many hours). Attaching error and cluster info: {code} root@db4:~# nodetool enablebinary error: Error starting native transport: null -- StackTrace -- java.lang.RuntimeException: Error starting native transport: null at org.apache.cassandra.service.StorageService.startNativeTransport(StorageService.java:350) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} {code} root@db4:~# nodetool describecluster Cluster Information: Name: Production Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: b7e98bb9-717f-3f59-bac4-84bc19544e90: [10.195.15.163, 10.195.15.162, 10.195.15.167, 10.195.15.166] {code} {code} root@db4:~# nodetool status Datacenter: Ashburn === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 10.195.15.163 12.05 GB 256 ? 0a9f478c-80b5-4c15-8b2e-e27df6684c69 RAC1 UN 10.195.15.162 12.8 GB256 ? c18d2218
[jira] [Updated] (CASSANDRA-8193) Multi-DC parallel snapshot repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Mårdell updated CASSANDRA-8193: - Attachment: cassandra-2.0-8193-2.txt Multi-DC parallel snapshot repair - Key: CASSANDRA-8193 URL: https://issues.apache.org/jira/browse/CASSANDRA-8193 Project: Cassandra Issue Type: Improvement Reporter: Jimmy Mårdell Assignee: Jimmy Mårdell Priority: Minor Fix For: 2.0.12 Attachments: cassandra-2.0-8193-1.txt, cassandra-2.0-8193-2.txt The current behaviour of snapshot repair is to let one node at a time calculate a merkle tree. This is to ensure only one node at a time is doing the expensive calculation. The drawback is that it takes even longer time to do the merkle tree calculation. In a multi-DC setup, I think it would make more sense to have one node in each DC calculate the merkle tree at the same time. This would yield a significant improvement when you have many data centers. I'm not sure how relevant this is in 2.1, but I don't see us upgrading to 2.1 any time soon. Unless there is an obvious drawback that I'm missing, I'd like to implement this in the 2.0 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8193) Multi-DC parallel snapshot repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212539#comment-14212539 ] Jimmy Mårdell commented on CASSANDRA-8193: -- New patched added. I've added an enum for specifying the degree of parallelism. This cascaded up in the code path a bit. Backward compatibility should be maintained, at the expense of adding a few more forceRepair methods in StorageService. As a side note, can't we remove many of forceRepair methods in StorageServiceMBean in 2.1? It's getting quite ugly. nodetool only uses two of them (one with range and one without range). Multi-DC parallel snapshot repair - Key: CASSANDRA-8193 URL: https://issues.apache.org/jira/browse/CASSANDRA-8193 Project: Cassandra Issue Type: Improvement Reporter: Jimmy Mårdell Assignee: Jimmy Mårdell Priority: Minor Fix For: 2.0.12 Attachments: cassandra-2.0-8193-1.txt, cassandra-2.0-8193-2.txt The current behaviour of snapshot repair is to let one node at a time calculate a merkle tree. This is to ensure only one node at a time is doing the expensive calculation. The drawback is that it takes even longer time to do the merkle tree calculation. In a multi-DC setup, I think it would make more sense to have one node in each DC calculate the merkle tree at the same time. This would yield a significant improvement when you have many data centers. I'm not sure how relevant this is in 2.1, but I don't see us upgrading to 2.1 any time soon. Unless there is an obvious drawback that I'm missing, I'd like to implement this in the 2.0 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8312) Use live sstables in snapshot repair if possible
[ https://issues.apache.org/jira/browse/CASSANDRA-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Mårdell updated CASSANDRA-8312: - Attachment: cassandra-2.0-8312-1.txt Use live sstables in snapshot repair if possible Key: CASSANDRA-8312 URL: https://issues.apache.org/jira/browse/CASSANDRA-8312 Project: Cassandra Issue Type: Improvement Reporter: Jimmy Mårdell Priority: Minor Attachments: cassandra-2.0-8312-1.txt Snapshot repair can be very much slower than parallel repairs because of the overhead of opening the SSTables in the snapshot. This is particular true when using LCS, as you typically have many smaller SSTables then. I compared parallel and sequential repair on a small range on one of our clusters (2*3 replicas). With parallel repair, this took 22 seconds. With sequential repair (default in 2.0), the same range took 330 seconds! This is an overhead of 330-22*6 = 198 seconds, just opening SSTables (there were 1000+ sstables). Also, opening 1000 sstables for many smaller rangers surely causes lots of memory churning. The idea would be to list the sstables in the snapshot, but use the corresponding sstables in the live set if it's still available. For almost all sstables, the original one should still exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8306) exception in nodetool enablebinary
[ https://issues.apache.org/jira/browse/CASSANDRA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214368#comment-14214368 ] Rafał Furmański commented on CASSANDRA-8306: start_native_transport is set to true. The whole joining process looks as follows: 1. Install cassandra 2.1.1 (from datastax debian repo) 2. stop cassandra 3. rm -rf /var/lib/cassandra/* 4. Modify cassandra.yaml and cassandra-rackdc.properties 5. Start cassandra After struggling for a couple of days and several restarts of stubborn node, it finally managed to join the cluster. I don't know why though. I didn't do anything specific. Binary protocol is up and running now. That's just weird! exception in nodetool enablebinary -- Key: CASSANDRA-8306 URL: https://issues.apache.org/jira/browse/CASSANDRA-8306 Project: Cassandra Issue Type: Bug Reporter: Rafał Furmański Attachments: system.log.zip I was trying to add new node (db4) to existing cluster - with no luck. I can't see any errors in system.log. nodetool status shows, that node is joining into cluster (for many hours). Attaching error and cluster info: {code} root@db4:~# nodetool enablebinary error: Error starting native transport: null -- StackTrace -- java.lang.RuntimeException: Error starting native transport: null at org.apache.cassandra.service.StorageService.startNativeTransport(StorageService.java:350) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} {code} root@db4:~# nodetool describecluster Cluster Information: Name: Production Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: b7e98bb9-717f-3f59-bac4-84bc19544e90: [10.195.15.163, 10.195.15.162, 10.195.15.167, 10.195.15.166] {code} {code} root@db4
[jira] [Comment Edited] (CASSANDRA-8306) exception in nodetool enablebinary
[ https://issues.apache.org/jira/browse/CASSANDRA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214368#comment-14214368 ] Rafał Furmański edited comment on CASSANDRA-8306 at 11/17/14 7:28 AM: -- start_native_transport is set to true. The whole process of adding new node looks as follows: 1. Install cassandra 2.1.1 (from datastax debian repo) 2. stop cassandra 3. rm -rf /var/lib/cassandra/* 4. Modify cassandra.yaml and cassandra-rackdc.properties 5. Start cassandra After struggling for a couple of days and several restarts of stubborn node, it finally managed to join the cluster. I don't know why though. I didn't do anything specific. Binary protocol is up and running now. That's just weird! was (Author: rfurmanski): start_native_transport is set to true. The whole joining process looks as follows: 1. Install cassandra 2.1.1 (from datastax debian repo) 2. stop cassandra 3. rm -rf /var/lib/cassandra/* 4. Modify cassandra.yaml and cassandra-rackdc.properties 5. Start cassandra After struggling for a couple of days and several restarts of stubborn node, it finally managed to join the cluster. I don't know why though. I didn't do anything specific. Binary protocol is up and running now. That's just weird! exception in nodetool enablebinary -- Key: CASSANDRA-8306 URL: https://issues.apache.org/jira/browse/CASSANDRA-8306 Project: Cassandra Issue Type: Bug Reporter: Rafał Furmański Attachments: system.log.zip I was trying to add new node (db4) to existing cluster - with no luck. I can't see any errors in system.log. nodetool status shows, that node is joining into cluster (for many hours). Attaching error and cluster info: {code} root@db4:~# nodetool enablebinary error: Error starting native transport: null -- StackTrace -- java.lang.RuntimeException: Error starting native transport: null at org.apache.cassandra.service.StorageService.startNativeTransport(StorageService.java:350) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670
[jira] [Commented] (CASSANDRA-8306) exception in nodetool enablebinary
[ https://issues.apache.org/jira/browse/CASSANDRA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214436#comment-14214436 ] Rafał Furmański commented on CASSANDRA-8306: Because step 1 is creating all necessary folders like /etc/cassandra? exception in nodetool enablebinary -- Key: CASSANDRA-8306 URL: https://issues.apache.org/jira/browse/CASSANDRA-8306 Project: Cassandra Issue Type: Bug Reporter: Rafał Furmański Attachments: system.log.zip I was trying to add new node (db4) to existing cluster - with no luck. I can't see any errors in system.log. nodetool status shows, that node is joining into cluster (for many hours). Attaching error and cluster info: {code} root@db4:~# nodetool enablebinary error: Error starting native transport: null -- StackTrace -- java.lang.RuntimeException: Error starting native transport: null at org.apache.cassandra.service.StorageService.startNativeTransport(StorageService.java:350) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} {code} root@db4:~# nodetool describecluster Cluster Information: Name: Production Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: b7e98bb9-717f-3f59-bac4-84bc19544e90: [10.195.15.163, 10.195.15.162, 10.195.15.167, 10.195.15.166] {code} {code} root@db4:~# nodetool status Datacenter: Ashburn === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 10.195.15.163 12.05 GB 256 ? 0a9f478c-80b5-4c15-8b2e-e27df6684c69 RAC1 UN 10.195.15.162 12.8 GB256 ? c18d2218-ef84-4165-9c3a-05f592f512e9 RAC1 UJ 10.195.15.167 18.61 GB 256
[jira] [Commented] (CASSANDRA-8306) exception in nodetool enablebinary
[ https://issues.apache.org/jira/browse/CASSANDRA-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214451#comment-14214451 ] Rafał Furmański commented on CASSANDRA-8306: Of course not. I was just following documentation. exception in nodetool enablebinary -- Key: CASSANDRA-8306 URL: https://issues.apache.org/jira/browse/CASSANDRA-8306 Project: Cassandra Issue Type: Bug Reporter: Rafał Furmański Attachments: system.log.zip I was trying to add new node (db4) to existing cluster - with no luck. I can't see any errors in system.log. nodetool status shows, that node is joining into cluster (for many hours). Attaching error and cluster info: {code} root@db4:~# nodetool enablebinary error: Error starting native transport: null -- StackTrace -- java.lang.RuntimeException: Error starting native transport: null at org.apache.cassandra.service.StorageService.startNativeTransport(StorageService.java:350) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} {code} root@db4:~# nodetool describecluster Cluster Information: Name: Production Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: b7e98bb9-717f-3f59-bac4-84bc19544e90: [10.195.15.163, 10.195.15.162, 10.195.15.167, 10.195.15.166] {code} {code} root@db4:~# nodetool status Datacenter: Ashburn === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 10.195.15.163 12.05 GB 256 ? 0a9f478c-80b5-4c15-8b2e-e27df6684c69 RAC1 UN 10.195.15.162 12.8 GB256 ? c18d2218-ef84-4165-9c3a-05f592f512e9 RAC1 UJ 10.195.15.167 18.61 GB 256 ? 0d3999d9
[jira] [Commented] (CASSANDRA-8193) Multi-DC parallel snapshot repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217678#comment-14217678 ] Jimmy Mårdell commented on CASSANDRA-8193: -- The only change I made to StorageServiceMBean is the addition of two new methods, so I think it should be fine? Multi-DC parallel snapshot repair - Key: CASSANDRA-8193 URL: https://issues.apache.org/jira/browse/CASSANDRA-8193 Project: Cassandra Issue Type: Improvement Reporter: Jimmy Mårdell Assignee: Jimmy Mårdell Priority: Minor Fix For: 2.0.12 Attachments: cassandra-2.0-8193-1.txt, cassandra-2.0-8193-2.txt The current behaviour of snapshot repair is to let one node at a time calculate a merkle tree. This is to ensure only one node at a time is doing the expensive calculation. The drawback is that it takes even longer time to do the merkle tree calculation. In a multi-DC setup, I think it would make more sense to have one node in each DC calculate the merkle tree at the same time. This would yield a significant improvement when you have many data centers. I'm not sure how relevant this is in 2.1, but I don't see us upgrading to 2.1 any time soon. Unless there is an obvious drawback that I'm missing, I'd like to implement this in the 2.0 branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7188) Wrong class type: class org.apache.cassandra.db.Column in CounterColumn.reconcile
[ https://issues.apache.org/jira/browse/CASSANDRA-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217738#comment-14217738 ] Nicolas Lalevée commented on CASSANDRA-7188: Our prod cluster upgraded to 2.0.11. Without incident ! Thank you for the bug fix. Wrong class type: class org.apache.cassandra.db.Column in CounterColumn.reconcile - Key: CASSANDRA-7188 URL: https://issues.apache.org/jira/browse/CASSANDRA-7188 Project: Cassandra Issue Type: Bug Reporter: Nicolas Lalevée Assignee: Aleksey Yeschenko Labels: qa-resolved Fix For: 2.0.11 Attachments: 7188.txt When migrating a cluster of 6 nodes from 1.2.11 to 2.0.7, we started to see on the first migrated node this error: {noformat} ERROR [ReplicateOnWriteStage:1] 2014-05-07 11:26:59,779 CassandraDaemon.java (line 198) Exception in thread Thread[ReplicateOnWriteStage:1,5,main] java.lang.AssertionError: Wrong class type: class org.apache.cassandra.db.Column at org.apache.cassandra.db.CounterColumn.reconcile(CounterColumn.java:159) at org.apache.cassandra.db.filter.QueryFilter$1.reduce(QueryFilter.java:109) at org.apache.cassandra.db.filter.QueryFilter$1.reduce(QueryFilter.java:103) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:112) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.NamesQueryFilter.collectReducedColumns(NamesQueryFilter.java:98) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1540) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1369) at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:55) at org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:100) at org.apache.cassandra.service.StorageProxy$8$1.runMayThrow(StorageProxy.java:1085) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1916) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} We then saw on the other 5 nodes, still on 1.2.x, this error: {noformat} ERROR [MutationStage:2793] 2014-05-07 11:46:12,301 CassandraDaemon.java (line 191) Exception in thread Thread[MutationStage:2793,5,main] java.lang.AssertionError: Wrong class type: class org.apache.cassandra.db.Column at org.apache.cassandra.db.CounterColumn.reconcile(CounterColumn.java:165) at org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:378) at org.apache.cassandra.db.AtomicSortedColumns.addColumn(AtomicSortedColumns.java:166) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:119) at org.apache.cassandra.db.SuperColumn.addColumn(SuperColumn.java:218) at org.apache.cassandra.db.SuperColumn.putColumn(SuperColumn.java:229) at org.apache.cassandra.db.ThreadSafeSortedColumns.addColumnInternal(ThreadSafeSortedColumns.java:108) at org.apache.cassandra.db.ThreadSafeSortedColumns.addAllWithSizeDelta(ThreadSafeSortedColumns.java:138) at org.apache.cassandra.db.AbstractColumnContainer.addAllWithSizeDelta(AbstractColumnContainer.java:99) at org.apache.cassandra.db.Memtable.resolve(Memtable.java:205) at org.apache.cassandra.db.Memtable.put(Memtable.java:168) at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:742) at org.apache.cassandra.db.Table.apply(Table.java:388) at org.apache.cassandra.db.Table.apply(Table.java:353
[jira] [Created] (CASSANDRA-8345) Client notifications should carry the entire delta of the information that changed
Michaël Figuière created CASSANDRA-8345: --- Summary: Client notifications should carry the entire delta of the information that changed Key: CASSANDRA-8345 URL: https://issues.apache.org/jira/browse/CASSANDRA-8345 Project: Cassandra Issue Type: Improvement Reporter: Michaël Figuière Currently when the schema changes, a {{SCHEMA_CHANGE}} notification is sent to the client to let it know that a modification happened in a specific table or keyspace. If the client register for these notifications, this is likely that it actually cares to have an up to date version of this information, so the next step is logically for the client to query the {{system}} keyspace to retrieve the latest version of the schema for the particular element that was mentioned in the notification. The same thing happen with the {{TOPOLOGY_CHANGE}} notification as the client will follow up with a query to retrieve the details that changed in the {{system.peers}} table. It would be interesting to send the entire delta of the information that changed within the notification. I see several advantages with this: * This would ensure that the data that are sent to the client are as small as possible as such a delta will always be smaller than the resultset that would eventually be received for a formal query on the {{system}} keyspace. * This avoid the Cassandra node to receive plenty of query after it issue a notification but rather to prepare a delta once and send it to everybody. * This should improve the overall behaviour when dealing with very large schemas with frequent changes (typically due to a tentative of implementing multitenancy through separate keyspaces), as it has been observed that the the notifications and subsequent queries traffic can become non negligible in this case. * This would eventually simplify the driver design by removing the need for an extra asynchronous operation to follow up with, although the benefit of this point will be real only once the previous versions of the protocols are far behind. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8340) Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218515#comment-14218515 ] Björn Hegerfors commented on CASSANDRA-8340: Actually, it's already implemented like that. The SSTables are paired with their age in the createSSTableAndMinTimestampPairs method. Max timestamps are only used in getNow and filterOldSSTables. I used min timestamps based on the same reasoning as yours. I agree that major compaction is the way to go when switching. Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions Key: CASSANDRA-8340 URL: https://issues.apache.org/jira/browse/CASSANDRA-8340 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Priority: Minor Currently we check how old the newest data (max timestamp) in an sstable is when we check if it should be compacted. If we instead switch to using min timestamp for this we have a pretty clean migration path from STCS/LCS to DTCS. My thinking is that before migrating, the user does a major compaction, which creates a huge sstable containing all data, with min timestamp very far back in time, then switching to DTCS, we will have a big sstable that we never compact (ie, min timestamp of this big sstable is before max_sstable_age_days), and all newer data will be after that, and that new data will be properly compacted WDYT [~Bj0rn] ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8340) Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218593#comment-14218593 ] Björn Hegerfors commented on CASSANDRA-8340: OK, let's see. This is a big SSTable with a timestamp span of [t0, t1]. Since it came out of a major compaction, t1 is close to the current time. DTCS would never generate an SSTable that large with t1 that close to current time. But as time passes, [t0, t1] eventually becomes a timestamp span that even DTCS could have generated. Only beyond that point in time would DTCS actually consider compacting it, because it's t0 that governs when it compacts next, not t1. This is because t0 is so old and so far away from the min timestamp of any other SSTable. I'm certain of this. I haven't got a formula for this (I wish to make one), but I think that the major compacted SSTable may even have to double its age before next compaction will happen, so if the min timestamp was older than max_sstable_age_days when switching strategies, the max timestamp will be older than that before any compaction was ever considered. In other words, your scenario is not in any way a particular reason to change the max_sstable_age_days behavior. There may still be other reasons. Did you get that? I had a hard time figuring out a sensible way to formulate my reasoning here. Rewrote this 3 times :P Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions Key: CASSANDRA-8340 URL: https://issues.apache.org/jira/browse/CASSANDRA-8340 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Priority: Minor Currently we check how old the newest data (max timestamp) in an sstable is when we check if it should be compacted. If we instead switch to using min timestamp for this we have a pretty clean migration path from STCS/LCS to DTCS. My thinking is that before migrating, the user does a major compaction, which creates a huge sstable containing all data, with min timestamp very far back in time, then switching to DTCS, we will have a big sstable that we never compact (ie, min timestamp of this big sstable is before max_sstable_age_days), and all newer data will be after that, and that new data will be properly compacted WDYT [~Bj0rn] ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8340) Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219354#comment-14219354 ] Björn Hegerfors commented on CASSANDRA-8340: No drawback, really. It doesn't make a big difference. Whatever is easiest to reason about would be best. It's true that in your repair example, it would have some effect, but only when the repair SSTables are not older than max_sstable_age_days while the big one is. I would imagine that repair would be likely to bring in a bunch of files that are older than max_sstable_age_days, which will stay scattered anyway. I suppose using min timestamp would align more with that the rest of the strategy uses to determine age. In fact, something that would work even more consistently with the strategy would be to specify maximum window size. Perhaps in terms of initial window size. We have * up to min_threshold windows of size 1, followed by * up to min_threshold windows of size min_threshold, followed by * up to min_threshold windows of size min_threshold^2, followed by * up to min_threshold windows of size min_threshold^3, followed by * etc. And then we can simply stop generating more windows after some point. The simplest, yet perhaps least intuitive, option would be max_window_exponent. If we set max_window_exponent=n, then we would stop after windows of size min_threshold^n. Example: max_window_exponent=3, min_threshold=4. The last few windows would be 64*base_time_seconds in size, no 256 window is every created. Other option alternatives are max_window or max_window_seconds. WDYT [~krummas]? Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions Key: CASSANDRA-8340 URL: https://issues.apache.org/jira/browse/CASSANDRA-8340 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Priority: Minor Currently we check how old the newest data (max timestamp) in an sstable is when we check if it should be compacted. If we instead switch to using min timestamp for this we have a pretty clean migration path from STCS/LCS to DTCS. My thinking is that before migrating, the user does a major compaction, which creates a huge sstable containing all data, with min timestamp very far back in time, then switching to DTCS, we will have a big sstable that we never compact (ie, min timestamp of this big sstable is before max_sstable_age_days), and all newer data will be after that, and that new data will be properly compacted WDYT [~Bj0rn] ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8340) Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219354#comment-14219354 ] Björn Hegerfors edited comment on CASSANDRA-8340 at 11/20/14 1:16 PM: -- No drawback, really. It doesn't make a big difference. Whatever is easiest to reason about would be best. It's true that in your repair example, it would have some effect, but only when the repair SSTables are not older than max_sstable_age_days while the big one is. I would imagine that repair would be equally likely to bring in a bunch of files that are older than max_sstable_age_days, which will stay scattered (uncompacted) anyway. I suppose using min timestamp would align more with that the rest of the strategy uses to determine age. In fact, something that would work even more consistently with the strategy would be to specify maximum window size. Perhaps in terms of initial window size. We have * up to min_threshold windows of size 1, followed by * up to min_threshold windows of size min_threshold, followed by * up to min_threshold windows of size min_threshold^2, followed by * up to min_threshold windows of size min_threshold^3, followed by * etc. And then we can simply stop generating more windows after some point. The simplest, yet perhaps least intuitive, option would be max_window_exponent. If we set max_window_exponent=n, then we would stop after windows of size min_threshold^n. Example: max_window_exponent=3, min_threshold=4. The last few windows would be 64*base_time_seconds in size, no 256 window is every created. Other option alternatives are max_window or max_window_seconds. WDYT [~krummas]? was (Author: bj0rn): No drawback, really. It doesn't make a big difference. Whatever is easiest to reason about would be best. It's true that in your repair example, it would have some effect, but only when the repair SSTables are not older than max_sstable_age_days while the big one is. I would imagine that repair would be likely to bring in a bunch of files that are older than max_sstable_age_days, which will stay scattered anyway. I suppose using min timestamp would align more with that the rest of the strategy uses to determine age. In fact, something that would work even more consistently with the strategy would be to specify maximum window size. Perhaps in terms of initial window size. We have * up to min_threshold windows of size 1, followed by * up to min_threshold windows of size min_threshold, followed by * up to min_threshold windows of size min_threshold^2, followed by * up to min_threshold windows of size min_threshold^3, followed by * etc. And then we can simply stop generating more windows after some point. The simplest, yet perhaps least intuitive, option would be max_window_exponent. If we set max_window_exponent=n, then we would stop after windows of size min_threshold^n. Example: max_window_exponent=3, min_threshold=4. The last few windows would be 64*base_time_seconds in size, no 256 window is every created. Other option alternatives are max_window or max_window_seconds. WDYT [~krummas]? Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions Key: CASSANDRA-8340 URL: https://issues.apache.org/jira/browse/CASSANDRA-8340 Project: Cassandra Issue Type: Improvement Reporter: Marcus Eriksson Priority: Minor Currently we check how old the newest data (max timestamp) in an sstable is when we check if it should be compacted. If we instead switch to using min timestamp for this we have a pretty clean migration path from STCS/LCS to DTCS. My thinking is that before migrating, the user does a major compaction, which creates a huge sstable containing all data, with min timestamp very far back in time, then switching to DTCS, we will have a big sstable that we never compact (ie, min timestamp of this big sstable is before max_sstable_age_days), and all newer data will be after that, and that new data will be properly compacted WDYT [~Bj0rn] ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8192) AssertionError in Memory.java
[ https://issues.apache.org/jira/browse/CASSANDRA-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220674#comment-14220674 ] Andreas Ländle commented on CASSANDRA-8192: --- Maybe this helps - I can reproduce this call stack on a 64-Bit machine (8 GB of Ram) running JetBrains upsource. Also the heap of RAM available to Cassandra should be big enough. C:\Tools\Upsource\internal\java\windows-amd64\jre\bin\java.exe, -ea, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=..\..\logs\cassandra, -Dfile.encoding=UTF-8, -Xbootclasspath/a:lib/jamm/jamm-0.2.6.jar, -javaagent:lib/jamm/jamm-0.2.6.jar, -d64, -Xmx3000m, -XX:MaxPermSize=128m, -jar, launcher\lib\app-wrapper\app-wrapper.jar, Apache Cassandra, AppStarter, com.jetbrains.cassandra.service.CassandraServiceMain] (at path: C:\Tools\Upsource\apps\cassandra, system properties: {launcher.app.home=C:\Tools\Upsource\apps\cassandra, launcher.app.logs.dir=C:\Tools\Upsource\logs\cassandra}) 18:05:01.043 [SSTableBatchOpen:4] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[SSTableBatchOpen:4,5,main] java.lang.AssertionError: null at org.apache.cassandra.io.util.Memory.size(Memory.java:307) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:135) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:83) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:50) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:48) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:766) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:725) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:402) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:302) ~[cassandra-all-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:438) ~[cassandra-all-2.1.1.jar:2.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_60] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_60] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_60] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60] Please let me know if I can provide additional information that may help you. AssertionError in Memory.java - Key: CASSANDRA-8192 URL: https://issues.apache.org/jira/browse/CASSANDRA-8192 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows-7-32 bit, 3GB RAM, Java 1.7.0_67 Reporter: Andreas Schnitzerling Assignee: Joshua McKenzie Attachments: cassandra.bat, cassandra.yaml, system.log Since update of 1 of 12 nodes from 2.1.0-rel to 2.1.1-rel Exception during start up. {panel:title=system.log} ERROR [SSTableBatchOpen:1] 2014-10-27 09:44:00,079 CassandraDaemon.java:153 - Exception in thread Thread[SSTableBatchOpen:1,5,main] java.lang.AssertionError: null at org.apache.cassandra.io.util.Memory.size(Memory.java:307) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:135) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:83) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:50) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:48) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:766) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:725) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:402) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:302) ~[apache-cassandra-2.1.1.jar:2.1.1
[jira] [Created] (CASSANDRA-8353) Prepared statement doesn't revalidate after table schema changes
Michał Jaszczyk created CASSANDRA-8353: -- Summary: Prepared statement doesn't revalidate after table schema changes Key: CASSANDRA-8353 URL: https://issues.apache.org/jira/browse/CASSANDRA-8353 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.1.2 Reporter: Michał Jaszczyk Having simple table: {code} CREATE TABLE test1 ( key TEXT, value TEXT, PRIMARY KEY (key) ); {code} I prepare following statement: {code} SELECT * FROM test1; {code} I run queries based on the statement which returns expected results. Then I update schema definition like this: {code} ALTER TABLE test1 ADD value2 TEXT; {code} I populate the value2 values and use the same statement again. The results returned by the same query don't include value2. I'm sure it is not cached in the driver/application because I was starting new process after changing schema. It looks to me like a bug. Please correct me if it works like this on purpose. I'm using ruby cql driver but I believe it is not related. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8356) Slice query on super column camily with counters don't get all the data
Nicolas Lalevée created CASSANDRA-8356: -- Summary: Slice query on super column camily with counters don't get all the data Key: CASSANDRA-8356 URL: https://issues.apache.org/jira/browse/CASSANDRA-8356 Project: Cassandra Issue Type: Bug Reporter: Nicolas Lalevée We've finally been able to upgrade our cluster to 2.0.11, after CASSANDRA-7188 being fixed. But now slice queries on a super column family with counters doesn't return all the expected data. We first though because of all the trouble we had that we lost data, but there a way to actually get the data, so nothing is lost; it just that cassandra seems to incorrectly skip it. See the following CQL log: {noformat} cqlsh:Theme desc table theme_view; CREATE TABLE theme_view ( key bigint, column1 varint, column2 text, value counter, PRIMARY KEY ((key), column1, column2) ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=1.00 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:Theme select * from theme_view where key = 99421 limit 10; key | column1 | column2| value ---+-++--- 99421 | -12 | 2011-03-25 |59 99421 | -12 | 2011-03-26 | 5 99421 | -12 | 2011-03-27 | 2 99421 | -12 | 2011-03-28 |40 99421 | -12 | 2011-03-29 |14 99421 | -12 | 2011-03-30 |17 99421 | -12 | 2011-03-31 | 5 99421 | -12 | 2011-04-01 |37 99421 | -12 | 2011-04-02 | 7 99421 | -12 | 2011-04-03 | 4 (10 rows) cqlsh:Theme select * from theme_view where key = 99421 and column1 = -12 limit 10; key | column1 | column2| value ---+-++--- 99421 | -12 | 2011-03-25 |59 99421 | -12 | 2014-05-06 |15 99421 | -12 | 2014-06-06 | 7 99421 | -12 | 2014-06-10 |22 99421 | -12 | 2014-06-11 |34 99421 | -12 | 2014-06-12 |35 99421 | -12 | 2014-06-13 |26 99421 | -12 | 2014-06-14 |16 99421 | -12 | 2014-06-15 |24 99421 | -12 | 2014-06-16 |25 (10 rows) {noformat} As you can see the second query should return data from 2012, but it is not. Via thrift, we have the exact same bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8356) Slice query on a super column family with counters don't get all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Lalevée updated CASSANDRA-8356: --- Summary: Slice query on a super column family with counters don't get all the data (was: Slice query on super column camily with counters don't get all the data) Slice query on a super column family with counters don't get all the data - Key: CASSANDRA-8356 URL: https://issues.apache.org/jira/browse/CASSANDRA-8356 Project: Cassandra Issue Type: Bug Reporter: Nicolas Lalevée We've finally been able to upgrade our cluster to 2.0.11, after CASSANDRA-7188 being fixed. But now slice queries on a super column family with counters doesn't return all the expected data. We first though because of all the trouble we had that we lost data, but there a way to actually get the data, so nothing is lost; it just that cassandra seems to incorrectly skip it. See the following CQL log: {noformat} cqlsh:Theme desc table theme_view; CREATE TABLE theme_view ( key bigint, column1 varint, column2 text, value counter, PRIMARY KEY ((key), column1, column2) ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=1.00 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:Theme select * from theme_view where key = 99421 limit 10; key | column1 | column2| value ---+-++--- 99421 | -12 | 2011-03-25 |59 99421 | -12 | 2011-03-26 | 5 99421 | -12 | 2011-03-27 | 2 99421 | -12 | 2011-03-28 |40 99421 | -12 | 2011-03-29 |14 99421 | -12 | 2011-03-30 |17 99421 | -12 | 2011-03-31 | 5 99421 | -12 | 2011-04-01 |37 99421 | -12 | 2011-04-02 | 7 99421 | -12 | 2011-04-03 | 4 (10 rows) cqlsh:Theme select * from theme_view where key = 99421 and column1 = -12 limit 10; key | column1 | column2| value ---+-++--- 99421 | -12 | 2011-03-25 |59 99421 | -12 | 2014-05-06 |15 99421 | -12 | 2014-06-06 | 7 99421 | -12 | 2014-06-10 |22 99421 | -12 | 2014-06-11 |34 99421 | -12 | 2014-06-12 |35 99421 | -12 | 2014-06-13 |26 99421 | -12 | 2014-06-14 |16 99421 | -12 | 2014-06-15 |24 99421 | -12 | 2014-06-16 |25 (10 rows) {noformat} As you can see the second query should return data from 2012, but it is not. Via thrift, we have the exact same bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8356) Slice query on a super column family with counters doesn't get all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Lalevée updated CASSANDRA-8356: --- Summary: Slice query on a super column family with counters doesn't get all the data (was: Slice query on a super column family with counters don't get all the data) Slice query on a super column family with counters doesn't get all the data --- Key: CASSANDRA-8356 URL: https://issues.apache.org/jira/browse/CASSANDRA-8356 Project: Cassandra Issue Type: Bug Reporter: Nicolas Lalevée We've finally been able to upgrade our cluster to 2.0.11, after CASSANDRA-7188 being fixed. But now slice queries on a super column family with counters doesn't return all the expected data. We first though because of all the trouble we had that we lost data, but there a way to actually get the data, so nothing is lost; it just that cassandra seems to incorrectly skip it. See the following CQL log: {noformat} cqlsh:Theme desc table theme_view; CREATE TABLE theme_view ( key bigint, column1 varint, column2 text, value counter, PRIMARY KEY ((key), column1, column2) ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND index_interval=128 AND read_repair_chance=1.00 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:Theme select * from theme_view where key = 99421 limit 10; key | column1 | column2| value ---+-++--- 99421 | -12 | 2011-03-25 |59 99421 | -12 | 2011-03-26 | 5 99421 | -12 | 2011-03-27 | 2 99421 | -12 | 2011-03-28 |40 99421 | -12 | 2011-03-29 |14 99421 | -12 | 2011-03-30 |17 99421 | -12 | 2011-03-31 | 5 99421 | -12 | 2011-04-01 |37 99421 | -12 | 2011-04-02 | 7 99421 | -12 | 2011-04-03 | 4 (10 rows) cqlsh:Theme select * from theme_view where key = 99421 and column1 = -12 limit 10; key | column1 | column2| value ---+-++--- 99421 | -12 | 2011-03-25 |59 99421 | -12 | 2014-05-06 |15 99421 | -12 | 2014-06-06 | 7 99421 | -12 | 2014-06-10 |22 99421 | -12 | 2014-06-11 |34 99421 | -12 | 2014-06-12 |35 99421 | -12 | 2014-06-13 |26 99421 | -12 | 2014-06-14 |16 99421 | -12 | 2014-06-15 |24 99421 | -12 | 2014-06-16 |25 (10 rows) {noformat} As you can see the second query should return data from 2012, but it is not. Via thrift, we have the exact same bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8357) ArrayOutOfBounds in cassandra-stress with inverted exponential distribution
Jens Preußner created CASSANDRA-8357: Summary: ArrayOutOfBounds in cassandra-stress with inverted exponential distribution Key: CASSANDRA-8357 URL: https://issues.apache.org/jira/browse/CASSANDRA-8357 Project: Cassandra Issue Type: Bug Components: Tools Environment: 6-node cassandra cluster (2.1.1) on debian. Reporter: Jens Preußner Fix For: 2.1.1 When using the CQLstress example from GitHub (https://github.com/apache/cassandra/blob/trunk/tools/cqlstress-example.yaml) with an inverted exponential distribution in the insert-partitions field, generated threads fail with Exception in thread Thread-20 java.lang.ArrayIndexOutOfBoundsException: 20 at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:307) See the gist https://gist.github.com/jenzopr/9edde53122554729c852 for the typetest.yaml I used. The call was: cassandra-stress user profile=typetest.yaml ops\(insert=1\) -node $NODES -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8357) ArrayOutOfBounds in cassandra-stress with inverted exponential distribution
[ https://issues.apache.org/jira/browse/CASSANDRA-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jens Preußner updated CASSANDRA-8357: - Fix Version/s: (was: 2.1.1) ArrayOutOfBounds in cassandra-stress with inverted exponential distribution --- Key: CASSANDRA-8357 URL: https://issues.apache.org/jira/browse/CASSANDRA-8357 Project: Cassandra Issue Type: Bug Components: Tools Environment: 6-node cassandra cluster (2.1.1) on debian. Reporter: Jens Preußner When using the CQLstress example from GitHub (https://github.com/apache/cassandra/blob/trunk/tools/cqlstress-example.yaml) with an inverted exponential distribution in the insert-partitions field, generated threads fail with Exception in thread Thread-20 java.lang.ArrayIndexOutOfBoundsException: 20 at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:307) See the gist https://gist.github.com/jenzopr/9edde53122554729c852 for the typetest.yaml I used. The call was: cassandra-stress user profile=typetest.yaml ops\(insert=1\) -node $NODES -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8359) Make DTCS consider removing SSTables much more frequently
Björn Hegerfors created CASSANDRA-8359: -- Summary: Make DTCS consider removing SSTables much more frequently Key: CASSANDRA-8359 URL: https://issues.apache.org/jira/browse/CASSANDRA-8359 Project: Cassandra Issue Type: Improvement Reporter: Björn Hegerfors Priority: Minor When I run DTCS on a table where every value has a TTL (always the same TTL), SSTables are completely expired, but still stay on disk for much longer than they need to. I've applied CASSANDRA-8243, but it doesn't make an apparent difference (probably because the subject SSTables are purged via compaction anyway, if not by directly dropping them). Disk size graphs show clearly that tombstones are only removed when the oldest SSTable participates in compaction. In the long run, size on disk continually grows bigger. This should not have to happen. It should easily be able to stay constant, thanks to DTCS separating the expired data from the rest. I think checks for whether SSTables can be dropped should happen much more frequently. This is something that probably only needs to be tweaked for DTCS, but perhaps there's a more general place to put this. Anyway, my thinking is that DTCS should, on every call to getNextBackgroundTask, check which SSTables can be dropped. It would be something like a call to CompactionController.getFullyExpiredSSTables with all non-compactingSSTables sent in as compacting and all other SSTables sent in as overlapping. The returned SSTables, if any, are then added to whichever set of SSTables that DTCS decides to compact. Then before the compaction happens, Cassandra is going to make another call to CompactionController.getFullyExpiredSSTables, where it will see that it can just drop them. This approach has a bit of redundancy in that it needs to call CompactionController.getFullyExpiredSSTables twice. To avoid that, the code path for deciding SSTables to drop would have to be changed. (Side tracking a little here: I'm also thinking that tombstone compactions could be considered more often in DTCS. Maybe even some kind of multi-SSTable tombstone compaction involving the oldest couple of SSTables...) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8360) In DTCS, always compact SSTables in the same time window, even if they are fewer than min_threshold
Björn Hegerfors created CASSANDRA-8360: -- Summary: In DTCS, always compact SSTables in the same time window, even if they are fewer than min_threshold Key: CASSANDRA-8360 URL: https://issues.apache.org/jira/browse/CASSANDRA-8360 Project: Cassandra Issue Type: Improvement Reporter: Björn Hegerfors Priority: Minor DTCS uses min_threshold to decide how many time windows of the same size that need to accumulate before merging into a larger window. The age of an SSTable is determined as its min timestamp, and it always falls into exactly one of the time windows. If multiple SSTables fall into the same window, DTCS considers compacting them, but if they are fewer than min_threshold, it decides not to do it. When do more than 1 but fewer than min_threshold SSTables end up in the same time window (except for the current window), you might ask? In the current state, DTCS can spill some extra SSTables into bigger windows when the previous window wasn't fully compacted, which happens all the time when the latest window stops being the current one. Also, repairs and hints can put new SSTables in old windows. I think, and [~jjordan] agreed in a comment on CASSANDRA-6602, that DTCS should ignore min_threshold and compact tables in the same windows regardless of how few they are. I guess max_threshold should still be respected. [~jjordan] suggested that this should apply to all windows but the current window, where all the new SSTables end up. That could make sense. I'm not clear on whether compacting many SSTables at once is more cost efficient or not, when it comes to the very newest and smallest SSTables. Maybe compacting as soon as 2 SSTables are seen is fine if the initial window size is small enough? I guess the opposite could be the case too; that the very newest SSTables should be compacted very many at a time? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8361) Make DTCS split SSTables to perfectly fit time windows
Björn Hegerfors created CASSANDRA-8361: -- Summary: Make DTCS split SSTables to perfectly fit time windows Key: CASSANDRA-8361 URL: https://issues.apache.org/jira/browse/CASSANDRA-8361 Project: Cassandra Issue Type: Improvement Reporter: Björn Hegerfors Priority: Minor The time windows that DTCS uses are what the strategy tries to align SSTables to, in order to get the right structure, for best performance. I added the ticket CASSANDRA-8360, taking SSTables one step closer to aligning with these windows in a 1:1 manner. The idea in this ticket is to perfectly align SSTables with the DTCS time windows, by splitting SSTables that cross window borders. This can lead to certain benefits, perhaps mostly in consistency and predictability terms, where it will be very well defined where every value is stored that is old enough to have stabilized. Read queries can be aligned with windows in order to guarantee a single disk seek (although then the client needs to know the right window placements). Basically, SSTables can be made to align perfectly on day borders, for example. Right now, there would be an SSTable that almost represents a day, but not perfectly. So some data is still in another SSTable. It could also be a useful property for tombstone expiration and repairs. Practically all splits would happen only in the latest time windows with the newest and smallest SSTables. After those are split, DTCS would never compact SSTables across window borders. I have a hard time seeing when this could cause an expensive operation except for when switching from another compaction strategy (or even from current DTCS), and after a major compaction. In fact major compaction for DTCS should put data perfectly in windows rather than everything in one SSTable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8356) Slice query on a super column family with counters doesn't get all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222950#comment-14222950 ] Nicolas Lalevée commented on CASSANDRA-8356: I got the snapshot data from a node on my local machine, and I tried to load it up in a local cassandra node 2.0.11. The node did the opening of the files correctly. But querying against it is impossible, I hit the following error: {noformat} ERROR 11:28:45,693 Exception in thread Thread[ReadStage:2,5,main] java.lang.RuntimeException: java.lang.IllegalArgumentException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1981) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:1) at org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:436) at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:141) at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:113) at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:202) at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:54) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:155) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:168) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144) at org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:87) at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:56) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376) at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:333) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1413) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1977) ... 3 more {nofomart} This reminded me of an error we had on our test cluster, when we tested the upgrade to 2.0.x : CASSANDRA-6733 So here, I ran an upgradesstable on our production cluster, and now the slice queries return all the expected data. So everything is back to normal (and I am very pleased by the lower cpu activity with 2.0.x for the same load). I looked up again the logs in prod, I still don't see any such Buffer.limit errors. I don't know what was going wrong. As for CASSANDRA-6733, I have a snapshot of the data before the upgrade_sstable (unfortunately I don't have a snapshot pre-upgrade, but somme sstables are sill in the old format). If someone wants the data to analyse it, concat me, nlalevee at scoop.it. Slice query on a super column family with counters doesn't get all the data --- Key: CASSANDRA-8356 URL: https://issues.apache.org/jira/browse/CASSANDRA-8356 Project: Cassandra Issue Type: Bug Reporter: Nicolas Lalevée Assignee: Aleksey Yeschenko Fix For: 2.0.12 We've finally been able to upgrade our cluster to 2.0.11, after CASSANDRA-7188 being fixed. But now slice queries on a super column family with counters doesn't return all the expected data. We first though because of all the trouble we had that we lost data, but there a way to actually
[jira] [Comment Edited] (CASSANDRA-8356) Slice query on a super column family with counters doesn't get all the data
[ https://issues.apache.org/jira/browse/CASSANDRA-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222950#comment-14222950 ] Nicolas Lalevée edited comment on CASSANDRA-8356 at 11/24/14 12:50 PM: --- I got the snapshot data from a node on my local machine, and I tried to load it up in a local cassandra node 2.0.11. The node did the opening of the files correctly. But querying against it is impossible, I hit the following error: {noformat} ERROR 11:28:45,693 Exception in thread Thread[ReadStage:2,5,main] java.lang.RuntimeException: java.lang.IllegalArgumentException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1981) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587) at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:61) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:1) at org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:436) at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:141) at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:113) at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:202) at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:54) at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:155) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:168) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140) at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144) at org.apache.cassandra.utils.MergeIterator$ManyToOne.init(MergeIterator.java:87) at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:56) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376) at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:333) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1413) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1977) ... 3 more {noformat} This reminded me of an error we had on our test cluster, when we tested the upgrade to 2.0.x : CASSANDRA-6733 So here, I ran an upgradesstable on our production cluster, and now the slice queries return all the expected data. So everything is back to normal (and I am very pleased by the lower cpu activity with 2.0.x for the same load). I looked up again the logs in prod, I still don't see any such Buffer.limit errors. I don't know what was going wrong. As for CASSANDRA-6733, I have a snapshot of the data before the upgrade_sstable (unfortunately I don't have a snapshot pre-upgrade, but somme sstables are sill in the old format). If someone wants the data to analyse it, concat me, nlalevee at scoop.it. was (Author: hibou): I got the snapshot data from a node on my local machine, and I tried to load it up in a local cassandra node 2.0.11. The node did the opening of the files correctly. But querying against it is impossible, I hit the following error: {noformat} ERROR 11:28:45,693 Exception in thread Thread[ReadStage:2,5,main] java.lang.RuntimeException: java.lang.IllegalArgumentException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1981) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
[jira] [Commented] (CASSANDRA-8192) AssertionError in Memory.java
[ https://issues.apache.org/jira/browse/CASSANDRA-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223070#comment-14223070 ] Andreas Ländle commented on CASSANDRA-8192: --- At least I'm using a 64-bit JVM JDK 7u60. For now I tested to run cassandra with 4GB heap-size (instead 3GB before) and at least until now I couldn't reproduce the error. AssertionError in Memory.java - Key: CASSANDRA-8192 URL: https://issues.apache.org/jira/browse/CASSANDRA-8192 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows-7-32 bit, 3GB RAM, Java 1.7.0_67 Reporter: Andreas Schnitzerling Assignee: Joshua McKenzie Attachments: cassandra.bat, cassandra.yaml, system.log Since update of 1 of 12 nodes from 2.1.0-rel to 2.1.1-rel Exception during start up. {panel:title=system.log} ERROR [SSTableBatchOpen:1] 2014-10-27 09:44:00,079 CassandraDaemon.java:153 - Exception in thread Thread[SSTableBatchOpen:1,5,main] java.lang.AssertionError: null at org.apache.cassandra.io.util.Memory.size(Memory.java:307) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:135) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:83) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:50) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:48) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:766) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:725) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:402) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:302) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:438) ~[apache-cassandra-2.1.1.jar:2.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.7.0_55] at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_55] at java.lang.Thread.run(Unknown Source) [na:1.7.0_55] {panel} In the attached log you can still see as well CASSANDRA-8069 and CASSANDRA-6283. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8192) AssertionError in Memory.java
[ https://issues.apache.org/jira/browse/CASSANDRA-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224383#comment-14224383 ] Andreas Ländle commented on CASSANDRA-8192: --- Hi Joshua, I know installed JVM JDK 7u71 (for sure the 64-Bit version) and I could reproduce the callstack by just running bin\cassandra.bat. Absolutely the same error. [Apache Cassandra Error] 11:45:08.300 [SSTableBatchOpen:4] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[SSTableBatchOpen:4,5,main] [Apache Cassandra Error] java.lang.AssertionError: null [Apache Cassandra Error]at org.apache.cassandra.io.util.Memory.size(Memory.java:307) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:135) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:83) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:50) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:48) ~[...] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:766) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:725) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:402) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:302) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:438) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.7.0_71] [Apache Cassandra Error]at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_71] [Apache Cassandra Error]at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_71] [Apache Cassandra Error]at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_71] [Apache Cassandra Error]at java.lang.Thread.run(Unknown Source) [na:1.7.0_71] AssertionError in Memory.java - Key: CASSANDRA-8192 URL: https://issues.apache.org/jira/browse/CASSANDRA-8192 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows-7-32 bit, 3GB RAM, Java 1.7.0_67 Reporter: Andreas Schnitzerling Assignee: Joshua McKenzie Attachments: cassandra.bat, cassandra.yaml, system.log Since update of 1 of 12 nodes from 2.1.0-rel to 2.1.1-rel Exception during start up. {panel:title=system.log} ERROR [SSTableBatchOpen:1] 2014-10-27 09:44:00,079 CassandraDaemon.java:153 - Exception in thread Thread[SSTableBatchOpen:1,5,main] java.lang.AssertionError: null at org.apache.cassandra.io.util.Memory.size(Memory.java:307) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:135) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:83) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:50) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:48) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:766) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:725) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:402) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:302) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:438) ~[apache-cassandra-2.1.1.jar:2.1.1] at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.7.0_55] at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_55
[jira] [Comment Edited] (CASSANDRA-8192) AssertionError in Memory.java
[ https://issues.apache.org/jira/browse/CASSANDRA-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224383#comment-14224383 ] Andreas Ländle edited comment on CASSANDRA-8192 at 11/25/14 10:53 AM: -- Hi Joshua, I kow installed JVM JDK 7u71 (since I'm facing the error again; for sure the 64-Bit version) and I could reproduce the callstack by just running bin\cassandra.bat. Absolutely the same error - directly at cassandra startup, no further operation is needed. [Apache Cassandra Error] 11:45:08.300 [SSTableBatchOpen:4] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[SSTableBatchOpen:4,5,main] [Apache Cassandra Error] java.lang.AssertionError: null [Apache Cassandra Error]at org.apache.cassandra.io.util.Memory.size(Memory.java:307) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:135) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:83) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:50) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:48) ~[...] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:766) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:725) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:402) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:302) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:438) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.7.0_71] [Apache Cassandra Error]at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_71] [Apache Cassandra Error]at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_71] [Apache Cassandra Error]at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_71] [Apache Cassandra Error]at java.lang.Thread.run(Unknown Source) [na:1.7.0_71] was (Author: alsoloplan): Hi Joshua, I know installed JVM JDK 7u71 (for sure the 64-Bit version) and I could reproduce the callstack by just running bin\cassandra.bat. Absolutely the same error. [Apache Cassandra Error] 11:45:08.300 [SSTableBatchOpen:4] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[SSTableBatchOpen:4,5,main] [Apache Cassandra Error] java.lang.AssertionError: null [Apache Cassandra Error]at org.apache.cassandra.io.util.Memory.size(Memory.java:307) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:135) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:83) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:50) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:48) ~[...] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:766) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:725) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:402) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:302) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:438) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.7.0_71] [Apache Cassandra Error]at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_71] [Apache
[jira] [Commented] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting
[ https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224390#comment-14224390 ] Björn Hegerfors commented on CASSANDRA-8371: Could you try with a lower base_time_seconds? I have a feeling that I set the default too high at 1 hour. That's a setting that I didn't actually benchmark in my testing, and I just set it rather arbitrarily. You also need to make sure the timestamp_resolution is set correctly. You should think of base_time_seconds as DTCS's equivalent of min_sstable_size in STCS. min_sstable_size is 50 MB by default, so you probably want to set base_time_seconds to whatever time it takes you to write 50 MB, on average. I suspect that will be a lot less than 1 hour. You could also try STCS with min_sstable_size set to the amount that you write in an hour, to see if that starts compacting equally much. If that's the cause here, I think that we should consider lowering the default value of base_time_seconds. Having it too low is better than too high. Does anyone have an estimate of a common (on the high end) MB/s write throughput for time series? DateTieredCompactionStrategy is always compacting -- Key: CASSANDRA-8371 URL: https://issues.apache.org/jira/browse/CASSANDRA-8371 Project: Cassandra Issue Type: Bug Components: Core Reporter: mck Assignee: Björn Hegerfors Labels: compaction, performance Attachments: java_gc_counts_rate-month.png, read-latency.png, sstables.png, vg2_iad-month.png Running 2.0.11 and having switched a table to [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that disk IO and gc count increase, along with the number of reads happening in the compaction hump of cfhistograms. Data, and generally performance, looks good, but compactions are always happening, and pending compactions are building up. The schema for this is {code}CREATE TABLE search ( loginid text, searchid timeuuid, description text, searchkey text, searchurl text, PRIMARY KEY ((loginid), searchid) );{code} We're sitting on about 82G (per replica) across 6 nodes in 4 DCs. CQL executed against this keyspace, and traffic patterns, can be seen in slides 7+8 of https://prezi.com/b9-aj6p2esft Attached are sstables-per-read and read-latency graphs from cfhistograms, and screenshots of our munin graphs as we have gone from STCS, to LCS (week ~44), to DTCS (week ~46). These screenshots are also found in the prezi on slides 9-11. [~pmcfadin], [~Bj0rn], Can this be a consequence of occasional deleted rows, as is described under (3) in the description of CASSANDRA-6602 ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8192) AssertionError in Memory.java
[ https://issues.apache.org/jira/browse/CASSANDRA-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224383#comment-14224383 ] Andreas Ländle edited comment on CASSANDRA-8192 at 11/25/14 11:26 AM: -- Hi Joshua, I now installed JVM JDK 7u71 (since I'm facing the error again; for sure the 64-Bit version) and I could reproduce the callstack by just running bin\cassandra.bat. Absolutely the same error - directly at cassandra startup, no further operation is needed. [Apache Cassandra Error] 11:45:08.300 [SSTableBatchOpen:4] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[SSTableBatchOpen:4,5,main] [Apache Cassandra Error] java.lang.AssertionError: null [Apache Cassandra Error]at org.apache.cassandra.io.util.Memory.size(Memory.java:307) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:135) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:83) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:50) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:48) ~[...] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:766) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:725) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:402) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:302) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:438) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.7.0_71] [Apache Cassandra Error]at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_71] [Apache Cassandra Error]at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_71] [Apache Cassandra Error]at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_71] [Apache Cassandra Error]at java.lang.Thread.run(Unknown Source) [na:1.7.0_71] was (Author: alsoloplan): Hi Joshua, I kow installed JVM JDK 7u71 (since I'm facing the error again; for sure the 64-Bit version) and I could reproduce the callstack by just running bin\cassandra.bat. Absolutely the same error - directly at cassandra startup, no further operation is needed. [Apache Cassandra Error] 11:45:08.300 [SSTableBatchOpen:4] ERROR o.a.c.service.CassandraDaemon - Exception in thread Thread[SSTableBatchOpen:4,5,main] [Apache Cassandra Error] java.lang.AssertionError: null [Apache Cassandra Error]at org.apache.cassandra.io.util.Memory.size(Memory.java:307) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:135) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:83) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:50) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:48) ~[...] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:766) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:725) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:402) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:302) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:438) ~[cassandra-all-2.1.1.jar:2.1.1] [Apache Cassandra Error]at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.7.0_71] [Apache Cassandra
[jira] [Commented] (CASSANDRA-8360) In DTCS, always compact SSTables in the same time window, even if they are fewer than min_threshold
[ https://issues.apache.org/jira/browse/CASSANDRA-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226463#comment-14226463 ] Björn Hegerfors commented on CASSANDRA-8360: OK, sounds fair. That essentially means that we want to treat the incoming window specially. A question worth asking is what we want the incoming window for. Currently it is keep the last unit of base_time_seconds compacted at all times. While it respects min_threshold, a value written early in the window will essentially be constantly recompacted once every (min_threshold - 1) subsequent sstable flushes. I'm fully aware that this might be a bad idea, or rather I wasn't sure if it was the right thing to do. Really, it's completely inspired by STCS's min_sstable_size which seems to do the same thing, i.e. not respect the logarithmic complexity tree-like merging on small enough SSTables. (Reminds me a bit of insertion sort being fastest on small enough arrays). So base_time_seconds has the same purpose. A problem is that it might be harder set a good default on time than on size. Setting min_sstable_size in STCS to 0 has an near-equivalent in DTCS: setting base_time_seconds to 1. The windows will be powers of base_time_seconds (up to base_time_seconds of each size), starting at 1 second. Even with this setting, data that is an hour old will be in near-hour large windows. The only meaningful difference is that SSTables 2 seconds and 10 seconds old will not be in the same window. What I mean by this, is that setting base_time_seconds to 1 is perfectly reasonable, it's just the same as setting min_sstable_size to 0 or 1 in STCS. I just want to make it clear that base_time_seconds is not really something that you should set to 1 hour (3600) just because you want SSTables older than 1 hour to be in nice 1-hour chunks. If you set it to 900 with min_threshold=4, SSTables older than 1 hour will still be in perfect 1 hour chunks (because preceding up to 4 900-second chunks, comes a 4*900=3600-second chunk). So I guess respecting min_threshold in the 'incoming window' is just as right as respecting min_threshold when compacting SSTables smaller than min_sstable_size in STCS. Which I believe it does. So there's my roundabout way of coming to the same conclusion as you, [~jbellis] :). I just have this feeling that the meaning of base_time_seconds isn't well understood. In DTCS, always compact SSTables in the same time window, even if they are fewer than min_threshold --- Key: CASSANDRA-8360 URL: https://issues.apache.org/jira/browse/CASSANDRA-8360 Project: Cassandra Issue Type: Improvement Reporter: Björn Hegerfors Priority: Minor DTCS uses min_threshold to decide how many time windows of the same size that need to accumulate before merging into a larger window. The age of an SSTable is determined as its min timestamp, and it always falls into exactly one of the time windows. If multiple SSTables fall into the same window, DTCS considers compacting them, but if they are fewer than min_threshold, it decides not to do it. When do more than 1 but fewer than min_threshold SSTables end up in the same time window (except for the current window), you might ask? In the current state, DTCS can spill some extra SSTables into bigger windows when the previous window wasn't fully compacted, which happens all the time when the latest window stops being the current one. Also, repairs and hints can put new SSTables in old windows. I think, and [~jjordan] agreed in a comment on CASSANDRA-6602, that DTCS should ignore min_threshold and compact tables in the same windows regardless of how few they are. I guess max_threshold should still be respected. [~jjordan] suggested that this should apply to all windows but the current window, where all the new SSTables end up. That could make sense. I'm not clear on whether compacting many SSTables at once is more cost efficient or not, when it comes to the very newest and smallest SSTables. Maybe compacting as soon as 2 SSTables are seen is fine if the initial window size is small enough? I guess the opposite could be the case too; that the very newest SSTables should be compacted very many at a time? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table
[ https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229701#comment-14229701 ] Piotr Kołaczkowski commented on CASSANDRA-7688: --- It would be nice to know also the average partition size in the given table, both in bytes and in number of CQL rows. This would be useful to set appropriate fetch.size. Additionally, current split generation API does not allow to set split size in terms of data size in bytes or number of CQL rows, but only by number of partitions. Number of partitions doesn't make a nice default, as partitions can vary greatly in size and are extremely use-case dependent. So please, don't just copy current describe_splits_ex functionality to the new driver, but *improve this*. We really don't need the driver / Cassandra to do the splitting for us. Instead we need to know: 1. estimate of total amount of data in the table in bytes 2. estimate of total number of CQL rows in the table 3. estimate of total number of partitions in the table We're interested both in totals (whole cluster; logical sizes; i.e. without replicas), and split by token-ranges by node (physical; incuding replicas). Add data sizing to a system table - Key: CASSANDRA-7688 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688 Project: Cassandra Issue Type: New Feature Reporter: Jeremiah Jordan Fix For: 2.1.3 Currently you can't implement something similar to describe_splits_ex purely from the a native protocol driver. https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily getting ownership information to a client in the java-driver. But you still need the data sizing part to get splits of a given size. We should add the sizing information to a system table so that native clients can get to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7688) Add data sizing to a system table
[ https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229701#comment-14229701 ] Piotr Kołaczkowski edited comment on CASSANDRA-7688 at 12/1/14 12:01 PM: - It would be nice to know also the average partition size in the given table, both in bytes and in number of CQL rows. This would be useful to set appropriate fetch.size. Additionally, current split generation API does not allow to set split size in terms of data size in bytes or number of CQL rows, but only by number of partitions. Number of partitions doesn't make a nice default, as partitions can vary greatly in size and are extremely use-case dependent. So please, don't just copy current describe_splits_ex functionality to the new driver, but *improve this*. We really don't need the driver / Cassandra to do the splitting for us. Instead we need to know: 1. estimate of total amount of data in the table in bytes 2. estimate of total number of CQL rows in the table 3. estimate of total number of partitions in the table We're interested both in totals (whole cluster; logical sizes; i.e. without replicas), and split by token-ranges by node (physical; incuding replicas). Note that this information is useful not just for Spark/Hadoop split generation, but also things like e.g. SparkSQL optimizer so it knows how much data will it have to process. The next step would be providing column data histograms to guide predicate selectivity. was (Author: pkolaczk): It would be nice to know also the average partition size in the given table, both in bytes and in number of CQL rows. This would be useful to set appropriate fetch.size. Additionally, current split generation API does not allow to set split size in terms of data size in bytes or number of CQL rows, but only by number of partitions. Number of partitions doesn't make a nice default, as partitions can vary greatly in size and are extremely use-case dependent. So please, don't just copy current describe_splits_ex functionality to the new driver, but *improve this*. We really don't need the driver / Cassandra to do the splitting for us. Instead we need to know: 1. estimate of total amount of data in the table in bytes 2. estimate of total number of CQL rows in the table 3. estimate of total number of partitions in the table We're interested both in totals (whole cluster; logical sizes; i.e. without replicas), and split by token-ranges by node (physical; incuding replicas). Add data sizing to a system table - Key: CASSANDRA-7688 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688 Project: Cassandra Issue Type: New Feature Reporter: Jeremiah Jordan Fix For: 2.1.3 Currently you can't implement something similar to describe_splits_ex purely from the a native protocol driver. https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily getting ownership information to a client in the java-driver. But you still need the data sizing part to get splits of a given size. We should add the sizing information to a system table so that native clients can get to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7688) Add data sizing to a system table
[ https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229701#comment-14229701 ] Piotr Kołaczkowski edited comment on CASSANDRA-7688 at 12/1/14 12:03 PM: - It would be nice to know also the average partition size in the given table, both in bytes and in number of CQL rows. This would be useful to set appropriate fetch.size. Additionally, current split generation API does not allow to set split size in terms of data size in bytes or number of CQL rows, but only by number of partitions. Number of partitions doesn't make a nice default, as partitions can vary greatly in size and are extremely use-case dependent. So please, don't just copy current describe_splits_ex functionality to the new driver, but *improve this*. We really don't need the driver / Cassandra to do the splitting for us. Instead we need to know: 1. estimate of total amount of data in the table in bytes 2. estimate of total number of CQL rows in the table 3. estimate of total number of partitions in the table We're interested both in totals (whole cluster; logical sizes; i.e. without replicas), and split by token-ranges by node (physical; incuding replicas). Note that this information is useful not just for Spark/Hadoop split generation, but also things like e.g. SparkSQL optimizer so it knows how much data will it have to process or to set appropriate fetch sizes when getting data, etc. The next step would be providing column data histograms to guide predicate selectivity. was (Author: pkolaczk): It would be nice to know also the average partition size in the given table, both in bytes and in number of CQL rows. This would be useful to set appropriate fetch.size. Additionally, current split generation API does not allow to set split size in terms of data size in bytes or number of CQL rows, but only by number of partitions. Number of partitions doesn't make a nice default, as partitions can vary greatly in size and are extremely use-case dependent. So please, don't just copy current describe_splits_ex functionality to the new driver, but *improve this*. We really don't need the driver / Cassandra to do the splitting for us. Instead we need to know: 1. estimate of total amount of data in the table in bytes 2. estimate of total number of CQL rows in the table 3. estimate of total number of partitions in the table We're interested both in totals (whole cluster; logical sizes; i.e. without replicas), and split by token-ranges by node (physical; incuding replicas). Note that this information is useful not just for Spark/Hadoop split generation, but also things like e.g. SparkSQL optimizer so it knows how much data will it have to process. The next step would be providing column data histograms to guide predicate selectivity. Add data sizing to a system table - Key: CASSANDRA-7688 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688 Project: Cassandra Issue Type: New Feature Reporter: Jeremiah Jordan Fix For: 2.1.3 Currently you can't implement something similar to describe_splits_ex purely from the a native protocol driver. https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily getting ownership information to a client in the java-driver. But you still need the data sizing part to get splits of a given size. We should add the sizing information to a system table so that native clients can get to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7827) Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat
[ https://issues.apache.org/jira/browse/CASSANDRA-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229809#comment-14229809 ] Piotr Kołaczkowski commented on CASSANDRA-7827: --- +1 Work around for output name restriction when using MultipleOutputs with CqlBulkOutputFormat --- Key: CASSANDRA-7827 URL: https://issues.apache.org/jira/browse/CASSANDRA-7827 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Paul Pak Assignee: Paul Pak Priority: Minor Labels: cql3, hadoop Attachments: trunk-7827-v1.txt When using MultipleOutputs with CqlBulkOutputFormat, the column family names to output to are restricted to only alphanumeric characters due to the logic found in MultipleOutputs.checkNamedOutputName(). This will provide a way to alias any column family name to a MultipleOutputs compatible output name, so that column family names won't be artificially restricted when using MultipleOutputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229812#comment-14229812 ] Piotr Kołaczkowski commented on CASSANDRA-2388: --- +1 ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.6 Reporter: Eldon Stegall Assignee: Paulo Motta Priority: Minor Labels: hadoop, inputformat Fix For: 2.0.12 Attachments: 0002_On_TException_try_next_split.patch, 1.2-CASSANDRA-2388.patch, 2.0-CASSANDRA-2388-v2.patch, 2.0-CASSANDRA-2388.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table
[ https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229828#comment-14229828 ] Piotr Kołaczkowski commented on CASSANDRA-7688: --- We only need estimates, not exact values. Factor 1.5x error is considered an awesome estimate, factor 3x is still fairly good. Also note that Spark/Hadoop does many token range scans. Maybe collecting some statistics on the fly, during the scans (or during the compaction) would be viable? And running a full compaction to get statistics more accurate - why not? You need to do it anyway to get top speed when scanning data in Spark, because a full table scan is doing kind-of implicit compaction anyway, isn't it? Also, one more thing - it would be good to have those values per column (sorry for making it even harder, I know it is not an easy task). At least to know that a column is responsible for xx% of data in the table - knowing such thing would make a huge difference when estimating data size, because we're not always fetching all columns and they may vary in size a lot (e.g. collections!). Some sampling on insert would probably be enough. Add data sizing to a system table - Key: CASSANDRA-7688 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688 Project: Cassandra Issue Type: New Feature Reporter: Jeremiah Jordan Fix For: 2.1.3 Currently you can't implement something similar to describe_splits_ex purely from the a native protocol driver. https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily getting ownership information to a client in the java-driver. But you still need the data sizing part to get splits of a given size. We should add the sizing information to a system table so that native clients can get to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8312) Use live sstables in snapshot repair if possible
[ https://issues.apache.org/jira/browse/CASSANDRA-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229956#comment-14229956 ] Jimmy Mårdell commented on CASSANDRA-8312: -- Ping on this. I'm okay if you think this patch is unnecessary due to CASSANDRA-7024, but I'd still be very happy with some feedback on if this approach is correct. [~krummas]? Use live sstables in snapshot repair if possible Key: CASSANDRA-8312 URL: https://issues.apache.org/jira/browse/CASSANDRA-8312 Project: Cassandra Issue Type: Improvement Reporter: Jimmy Mårdell Assignee: Jimmy Mårdell Priority: Minor Attachments: cassandra-2.0-8312-1.txt Snapshot repair can be very much slower than parallel repairs because of the overhead of opening the SSTables in the snapshot. This is particular true when using LCS, as you typically have many smaller SSTables then. I compared parallel and sequential repair on a small range on one of our clusters (2*3 replicas). With parallel repair, this took 22 seconds. With sequential repair (default in 2.0), the same range took 330 seconds! This is an overhead of 330-22*6 = 198 seconds, just opening SSTables (there were 1000+ sstables). Also, opening 1000 sstables for many smaller rangers surely causes lots of memory churning. The idea would be to list the sstables in the snapshot, but use the corresponding sstables in the live set if it's still available. For almost all sstables, the original one should still exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table
[ https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230051#comment-14230051 ] Piotr Kołaczkowski commented on CASSANDRA-7688: --- Fair enough. Just saying describe_splits is pretty bad for the reason it is not possible to set some reasonable default for split size. Some users were already pointing that out in our issue tracker. Add data sizing to a system table - Key: CASSANDRA-7688 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688 Project: Cassandra Issue Type: New Feature Reporter: Jeremiah Jordan Fix For: 2.1.3 Currently you can't implement something similar to describe_splits_ex purely from the a native protocol driver. https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily getting ownership information to a client in the java-driver. But you still need the data sizing part to get splits of a given size. We should add the sizing information to a system table so that native clients can get to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting
[ https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230304#comment-14230304 ] Björn Hegerfors commented on CASSANDRA-8371: [~jbellis] How about adding max_sstable_age_seconds and preferring it if both are set (or give an error if both are set), without deprecating _days? [~michaelsembwever] I didn't answer before, but since you don't write more than 50 MB per hour, I don't think that base_time_seconds is the problem. I don't really have any ideas about what could cause this increased IO. I suppose logging would help. DTCS logs exactly the same things as STCS, but maybe some additional timestamp information would be useful to see as well. In CASSANDRA-6602 I attached TimestampViewer.java which takes all the *data.db files in a data folder and outputs some relevant timestamp metadata (overlaps, for example). I find it useful to look at its output sometimes on our DTCS clusters. I've also generated some images from its output, which illustrates very well what DTCS sees. When I get time, I could clean it up to make it work more generally, if anyone is interested. It's written in Haskell, using the Diagrams library. DateTieredCompactionStrategy is always compacting -- Key: CASSANDRA-8371 URL: https://issues.apache.org/jira/browse/CASSANDRA-8371 Project: Cassandra Issue Type: Bug Components: Core Reporter: mck Assignee: Björn Hegerfors Labels: compaction, performance Attachments: java_gc_counts_rate-month.png, read-latency-recommenders-adview.png, read-latency.png, sstables-recommenders-adviews.png, sstables.png, vg2_iad-month.png Running 2.0.11 and having switched a table to [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that disk IO and gc count increase, along with the number of reads happening in the compaction hump of cfhistograms. Data, and generally performance, looks good, but compactions are always happening, and pending compactions are building up. The schema for this is {code}CREATE TABLE search ( loginid text, searchid timeuuid, description text, searchkey text, searchurl text, PRIMARY KEY ((loginid), searchid) );{code} We're sitting on about 82G (per replica) across 6 nodes in 4 DCs. CQL executed against this keyspace, and traffic patterns, can be seen in slides 7+8 of https://prezi.com/b9-aj6p2esft/ Attached are sstables-per-read and read-latency graphs from cfhistograms, and screenshots of our munin graphs as we have gone from STCS, to LCS (week ~44), to DTCS (week ~46). These screenshots are also found in the prezi on slides 9-11. [~pmcfadin], [~Bj0rn], Can this be a consequence of occasional deleted rows, as is described under (3) in the description of CASSANDRA-6602 ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8414) Avoid loops over array backed iterators that call iter.remove()
[ https://issues.apache.org/jira/browse/CASSANDRA-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Mårdell updated CASSANDRA-8414: - Attachment: cassandra-2.0-8414-1.txt Avoid loops over array backed iterators that call iter.remove() --- Key: CASSANDRA-8414 URL: https://issues.apache.org/jira/browse/CASSANDRA-8414 Project: Cassandra Issue Type: Bug Components: Core Reporter: Richard Low Labels: performance Fix For: 2.1.3 Attachments: cassandra-2.0-8414-1.txt I noticed from sampling that sometimes compaction spends almost all of its time in iter.remove() in ColumnFamilyStore.removeDeletedStandard. It turns out that the cf object is using ArrayBackedSortedColumns, so deletes are from an ArrayList. If the majority of your columns are GCable tombstones then this is O(n^2). The data structure should be changed or a copy made to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4986) Allow finer control of ALLOW FILTERING behavior
[ https://issues.apache.org/jira/browse/CASSANDRA-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236972#comment-14236972 ] Michał Michalski commented on CASSANDRA-4986: - From what I understand: LIMIT defines the maximum number of rows we want to return. If there are rows matching your query, they're guaranteed to be returned (up to the LIMIT), but it may take a long time to find them all depending on the dataset size. You will get correct result, but there's no guarantee on the execution time. MAX defines the maximum number of rows we want to iterate over (even if none of them was matching your query). Even if there are rows matching your query, they might not be returned if it requires C* to iterate over too many ( MAX) rows to find them. This guarantees that the execution time of your query will not be worse than what it takes to iterate over MAX rows, but you might get inaccurate result (assuming more useful implementation, see point 1 in description). Allow finer control of ALLOW FILTERING behavior --- Key: CASSANDRA-4986 URL: https://issues.apache.org/jira/browse/CASSANDRA-4986 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Priority: Minor Fix For: 3.0 CASSANDRA-4915 added {{ALLOW FILTERING}} to warn people when they do potentially inefficient queries. However, as discussed in the former issue it would be interesting to allow controlling that mode more precisely by allowing something like: {noformat} ... ALLOW FILTERING MAX 500 {noformat} whose behavior would be that the query would be short-circuited if it filters (i.e. read but discard from the ResultSet) more than 500 CQL3 rows. There is however 2 details I'm not totally clear on: # what to do exactly when we reach the max filtering allowed. Do we return what we have so far, but then we need to have a way to say in the result set that the query was short-circuited. Or do we just throw an exception TooManyFiltered (simpler but maybe a little bit less useful). # what about deleted records? Should we count them as 'filtered'? Imho the logical thing is to not count them as filtered, since after all we filter them out in the normal path (i.e. even when ALLOW FILTERING is not used). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8439) Consider using a symbol that won't have to be escaped around UDF's body
Michaël Figuière created CASSANDRA-8439: --- Summary: Consider using a symbol that won't have to be escaped around UDF's body Key: CASSANDRA-8439 URL: https://issues.apache.org/jira/browse/CASSANDRA-8439 Project: Cassandra Issue Type: Improvement Reporter: Michaël Figuière Right now the CQL grammar defines the UDF's body as a {{STRING_LITERAL}}. I understand that this is convenient in the grammar and avoid creating a special type just for these bodies. But the problem is that a quote is a fairly common symbol in the programming languages that will be used inside this body, which means that the developer will need to escape all these quotes in the UDF's body. That will be fairly annoying, not quite readable and tedious to maintain (à la {{\}} in Java Regexes...). Could we consider using curly braces or parentheses as delimiters? Though, I don't realize how hard it would be to use such asymmetric delimiters in the ANTLR grammar while still allowing them to be used within the body of the UDF. Another symmetric delimiter could be considered instead otherwise. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8439) Consider using a symbol that won't have to be escaped around UDF's body
[ https://issues.apache.org/jira/browse/CASSANDRA-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michaël Figuière updated CASSANDRA-8439: Description: Right now the CQL grammar defines the UDF's body as a {{STRING_LITERAL}}. I understand that this is convenient in the grammar and avoid creating a special type just for these bodies. But the problem is that a quote is a fairly common symbol in the programming languages that will be used inside this body, which means that the developer will need to escape all these quotes in the UDF's body. That will be fairly annoying, not quite readable and tedious to maintain (à la backslash in Java Regexes...). Could we consider using curly braces or parentheses as delimiters? Though, I don't realize how hard it would be to use such asymmetric delimiters in the ANTLR grammar while still allowing them to be used within the body of the UDF. Another symmetric delimiter could be considered instead otherwise. was: Right now the CQL grammar defines the UDF's body as a {{STRING_LITERAL}}. I understand that this is convenient in the grammar and avoid creating a special type just for these bodies. But the problem is that a quote is a fairly common symbol in the programming languages that will be used inside this body, which means that the developer will need to escape all these quotes in the UDF's body. That will be fairly annoying, not quite readable and tedious to maintain (à la {{\\}} in Java Regexes...). Could we consider using curly braces or parentheses as delimiters? Though, I don't realize how hard it would be to use such asymmetric delimiters in the ANTLR grammar while still allowing them to be used within the body of the UDF. Another symmetric delimiter could be considered instead otherwise. Consider using a symbol that won't have to be escaped around UDF's body --- Key: CASSANDRA-8439 URL: https://issues.apache.org/jira/browse/CASSANDRA-8439 Project: Cassandra Issue Type: Improvement Reporter: Michaël Figuière Right now the CQL grammar defines the UDF's body as a {{STRING_LITERAL}}. I understand that this is convenient in the grammar and avoid creating a special type just for these bodies. But the problem is that a quote is a fairly common symbol in the programming languages that will be used inside this body, which means that the developer will need to escape all these quotes in the UDF's body. That will be fairly annoying, not quite readable and tedious to maintain (à la backslash in Java Regexes...). Could we consider using curly braces or parentheses as delimiters? Though, I don't realize how hard it would be to use such asymmetric delimiters in the ANTLR grammar while still allowing them to be used within the body of the UDF. Another symmetric delimiter could be considered instead otherwise. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8439) Consider using a symbol that won't have to be escaped around UDF's body
[ https://issues.apache.org/jira/browse/CASSANDRA-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michaël Figuière updated CASSANDRA-8439: Description: Right now the CQL grammar defines the UDF's body as a {{STRING_LITERAL}}. I understand that this is convenient in the grammar and avoid creating a special type just for these bodies. But the problem is that a quote is a fairly common symbol in the programming languages that will be used inside this body, which means that the developer will need to escape all these quotes in the UDF's body. That will be fairly annoying, not quite readable and tedious to maintain (à la {{\\}} in Java Regexes...). Could we consider using curly braces or parentheses as delimiters? Though, I don't realize how hard it would be to use such asymmetric delimiters in the ANTLR grammar while still allowing them to be used within the body of the UDF. Another symmetric delimiter could be considered instead otherwise. was: Right now the CQL grammar defines the UDF's body as a {{STRING_LITERAL}}. I understand that this is convenient in the grammar and avoid creating a special type just for these bodies. But the problem is that a quote is a fairly common symbol in the programming languages that will be used inside this body, which means that the developer will need to escape all these quotes in the UDF's body. That will be fairly annoying, not quite readable and tedious to maintain (à la {{\}} in Java Regexes...). Could we consider using curly braces or parentheses as delimiters? Though, I don't realize how hard it would be to use such asymmetric delimiters in the ANTLR grammar while still allowing them to be used within the body of the UDF. Another symmetric delimiter could be considered instead otherwise. Consider using a symbol that won't have to be escaped around UDF's body --- Key: CASSANDRA-8439 URL: https://issues.apache.org/jira/browse/CASSANDRA-8439 Project: Cassandra Issue Type: Improvement Reporter: Michaël Figuière Right now the CQL grammar defines the UDF's body as a {{STRING_LITERAL}}. I understand that this is convenient in the grammar and avoid creating a special type just for these bodies. But the problem is that a quote is a fairly common symbol in the programming languages that will be used inside this body, which means that the developer will need to escape all these quotes in the UDF's body. That will be fairly annoying, not quite readable and tedious to maintain (à la {{\\}} in Java Regexes...). Could we consider using curly braces or parentheses as delimiters? Though, I don't realize how hard it would be to use such asymmetric delimiters in the ANTLR grammar while still allowing them to be used within the body of the UDF. Another symmetric delimiter could be considered instead otherwise. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8439) Consider using a symbol that won't have to be escaped around UDF's body
[ https://issues.apache.org/jira/browse/CASSANDRA-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238402#comment-14238402 ] Michaël Figuière commented on CASSANDRA-8439: - Sorry, missed it. Thanks! Consider using a symbol that won't have to be escaped around UDF's body --- Key: CASSANDRA-8439 URL: https://issues.apache.org/jira/browse/CASSANDRA-8439 Project: Cassandra Issue Type: Improvement Reporter: Michaël Figuière Right now the CQL grammar defines the UDF's body as a {{STRING_LITERAL}}. I understand that this is convenient in the grammar and avoid creating a special type just for these bodies. But the problem is that a quote is a fairly common symbol in the programming languages that will be used inside this body, which means that the developer will need to escape all these quotes in the UDF's body. That will be fairly annoying, not quite readable and tedious to maintain (à la backslash in Java Regexes...). Could we consider using curly braces or parentheses as delimiters? Though, I don't realize how hard it would be to use such asymmetric delimiters in the ANTLR grammar while still allowing them to be used within the body of the UDF. Another symmetric delimiter could be considered instead otherwise. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8312) Use live sstables in snapshot repair if possible
[ https://issues.apache.org/jira/browse/CASSANDRA-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239488#comment-14239488 ] Jimmy Mårdell commented on CASSANDRA-8312: -- Were the references you refer to added in 2.1? I notice that 2.1 does a SSTableReader.releaseReferences(sstables); on the snapshots sstables which (before my patch) wasn't the case on 2.0. If so, it should be enough to just remove the row sstable.acquireReference(); in getSnapshotSSTableReader on the 2.1/trunk branches. Use live sstables in snapshot repair if possible Key: CASSANDRA-8312 URL: https://issues.apache.org/jira/browse/CASSANDRA-8312 Project: Cassandra Issue Type: Improvement Reporter: Jimmy Mårdell Assignee: Jimmy Mårdell Priority: Minor Fix For: 2.0.12, 3.0, 2.1.3 Attachments: cassandra-2.0-8312-1.txt Snapshot repair can be very much slower than parallel repairs because of the overhead of opening the SSTables in the snapshot. This is particular true when using LCS, as you typically have many smaller SSTables then. I compared parallel and sequential repair on a small range on one of our clusters (2*3 replicas). With parallel repair, this took 22 seconds. With sequential repair (default in 2.0), the same range took 330 seconds! This is an overhead of 330-22*6 = 198 seconds, just opening SSTables (there were 1000+ sstables). Also, opening 1000 sstables for many smaller rangers surely causes lots of memory churning. The idea would be to list the sstables in the snapshot, but use the corresponding sstables in the live set if it's still available. For almost all sstables, the original one should still exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8414) Avoid loops over array backed iterators that call iter.remove()
[ https://issues.apache.org/jira/browse/CASSANDRA-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239555#comment-14239555 ] Jimmy Mårdell commented on CASSANDRA-8414: -- Good point. I've attached a new patch containing code using removed.nextSetBit and Collections.copy. Should be easy now to change for 2.1. Avoid loops over array backed iterators that call iter.remove() --- Key: CASSANDRA-8414 URL: https://issues.apache.org/jira/browse/CASSANDRA-8414 Project: Cassandra Issue Type: Bug Components: Core Reporter: Richard Low Assignee: Jimmy Mårdell Labels: performance Fix For: 2.1.3 Attachments: cassandra-2.0-8414-1.txt, cassandra-2.0-8414-2.txt I noticed from sampling that sometimes compaction spends almost all of its time in iter.remove() in ColumnFamilyStore.removeDeletedStandard. It turns out that the cf object is using ArrayBackedSortedColumns, so deletes are from an ArrayList. If the majority of your columns are GCable tombstones then this is O(n^2). The data structure should be changed or a copy made to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)