Re: Cassandra 5.0 Beta1 - vector searching results
Memtable off heap memory used: 0 > Memtable switch count: 16 > Speculative retries: 0 > Local read count: 0 > Local read latency: NaN ms > Local write count: 2893108 > Local write latency: NaN ms > Local read/write ratio: 0.0 > Pending flushes: 0 > Percent repaired: 100.0 > Bytes repaired: 9.066GiB > Bytes unrepaired: 0B > Bytes pending repair: 0B > Bloom filter false positives: 7245 > Bloom filter false ratio: 0.00286 > Bloom filter space used: 87264 > Bloom filter off heap memory used: 87216 > Index summary off heap memory used: 34624 > Compression metadata off heap memory used: 4753072 > Compacted partition minimum bytes: 2760 > Compacted partition maximum bytes: 4866323 > Compacted partition mean bytes: 154523 > Average live cells per slice (last five minutes): NaN > Maximum live cells per slice (last five minutes): 0 > Average tombstones per slice (last five minutes): NaN > Maximum tombstones per slice (last five minutes): 0 > Droppable tombstone ratio: 0.0 > > nodetool tablehistograms doc.embeddings_googleflant5large > > doc/embeddings_googleflant5large histograms > Percentile Read Latency Write Latency SSTables > Partition SizeCell Count > (micros) (micros) (bytes) > 50% 0.00 0.00 0.00 > 105778 124 > 75% 0.00 0.00 0.00 > 182785 215 > 95% 0.00 0.00 0.00 > 379022 446 > 98% 0.00 0.00 0.00 > 545791 642 > 99% 0.00 0.00 0.00 > 654949 924 > Min 0.00 0.00 0.00 > 2760 4 > Max 0.00 0.00 0.00 > 4866323 5722 > > Running a query such as: > > select uuid,offset,type,textdata from doc.embeddings_googleflant5large > order by embeddings ANN OF [768 dimension vector] limit 20; > > Works fine - typically less than 5 seconds to return. Subsequent > queries are even faster. If I'm activity adding data to the table, the > searches can sometimes timeout (using cqlsh). > If I add something to the where clause, the performance drops > significantly: > > select uuid,offset,type,textdata from doc.embeddings_googleflant5large > where offset=1 order by embeddings ANN OF [] limit 20; > > That query will timeout when running in cqlsh and with no data being > added to the table. > We've been running a Weaviate database side-by-side with Cassandra 4, > and would love to drop Weaviate if we can do all the vector searches > inside of Cassandra. > What else can I try? Anything to increase performance? > Thanks all! > > -Joe > > > -- > This email has been checked for viruses by AVG antivirus software. > www.avg.com > -- Jonathan Ellis co-founder, http://www.datastax.com @spyced
Re: DataStax Accelerate CFP
Happy new year, everyone! Reminder, the Accelerate CFP closes in just over two weeks: https://www.datastax.com/blog/2019/11/datastax-accelerate-20-call-papers-now-open On Mon, Nov 11, 2019 at 1:28 PM Jonathan Ellis wrote: > This spring DataStax kicked off Accelerate, a new conference carrying on > the spirit and tradition of the seven Cassandra Summits that we sponsored > and organized in the past. We had a great set of talks (check out > ungated, full session videos here > <https://www.youtube.com/playlist?list=PLm-EPIkBI3YpJbuKUGDlZVNHzT0umcBSl>) > and even more great conversations with an attendance of about a thousand. > > Next year's Accelerate <https://www.datastax.com/accelerate> will be May > 11-13 in San Diego, and the call for papers is now open! We'd love to hear > about your successes, custom extensions, and lessons learned with Apache > Cassandra or DataStax. The direct link to submitting a proposal is here > <https://sessionize.com/datastax-accelerate-san-diego/>, and more > background with suggestions for first-time speakers is here > <https://www.datastax.com/blog/2019/11/datastax-accelerate-20-call-papers-now-open> > . > > The CFP closes Jan 22. Hope to hear from you soon! > > -- > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced > -- Jonathan Ellis co-founder, http://www.datastax.com @spyced
[Announce] DataStax Support for Apache Cassandra, New Tools
*Hi all,* * Today DataStax is pleased to announce Luna <https://www.datastax.com/services/datastax-luna>: support for Apache Cassandra versions 2.1, 2.2, 3.0, and 3.11. The short version is that with Luna, we’re making our expertise available to Apache Cassandra users as a subscription-based support plan with public pricing that you can buy directly through our website. The full announcement is here <https://www.datastax.com/press-release/introducing-datastax-luna-enterprise-support-apache-cassandra>. Additionally, as part of our ongoing commitment to Cassandra, we’re also announcing the availability of DataStax Bulk Loader <https://downloads.datastax.com/#bulk-loader> and DataStax Apache Kafka Connector <https://downloads.datastax.com/#akc> as free downloads, making loading and unloading data from Cassandra faster and easier. Details of this release are here <https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>. * -- Jonathan Ellis co-founder, http://www.datastax.com @spyced
DataStax Accelerate CFP
This spring DataStax kicked off Accelerate, a new conference carrying on the spirit and tradition of the seven Cassandra Summits that we sponsored and organized in the past. We had a great set of talks (check out ungated, full session videos here <https://www.youtube.com/playlist?list=PLm-EPIkBI3YpJbuKUGDlZVNHzT0umcBSl>) and even more great conversations with an attendance of about a thousand. Next year's Accelerate <https://www.datastax.com/accelerate> will be May 11-13 in San Diego, and the call for papers is now open! We'd love to hear about your successes, custom extensions, and lessons learned with Apache Cassandra or DataStax. The direct link to submitting a proposal is here <https://sessionize.com/datastax-accelerate-san-diego/>, and more background with suggestions for first-time speakers is here <https://www.datastax.com/blog/2019/11/datastax-accelerate-20-call-papers-now-open> . The CFP closes Jan 22. Hope to hear from you soon! -- Jonathan Ellis co-founder, http://www.datastax.com @spyced
Re: Released an ACID-compliant transaction library on top of Cassandra
Which was followed up by https://www.researchgate.net/profile/Akon_Dey/publication/282156834_Scalable_Distributed_Transactions_across_Heterogeneous_Stores/links/56058b9608ae5e8e3f32b98d.pdf On Tue, Oct 16, 2018 at 1:02 PM Jonathan Ellis wrote: > It looks like it's based on this: > http://www.vldb.org/pvldb/vol6/p1434-dey.pdf > > On Tue, Oct 16, 2018 at 11:37 AM Ariel Weisberg wrote: > >> Hi, >> >> Yes this does sound great. Does this rely on Cassandra's internal SERIAL >> consistency and CAS functionality or is that implemented at a higher level? >> >> Regards, >> Ariel >> >> On Tue, Oct 16, 2018, at 12:31 PM, Jeff Jirsa wrote: >> > This is great! >> > >> > -- >> > Jeff Jirsa >> > >> > >> > > On Oct 16, 2018, at 5:47 PM, Hiroyuki Yamada >> wrote: >> > > >> > > Hi all, >> > > >> > > # Sorry, I accidentally emailed the following to dev@, so re-sending >> to here. >> > > >> > > We have been working on ACID-compliant transaction library on top of >> > > Cassandra called Scalar DB, >> > > and are pleased to announce the release of v.1.0 RC version in open >> source. >> > > >> > > https://github.com/scalar-labs/scalardb/ >> > > >> > > Scalar DB is a library that provides a distributed storage abstraction >> > > and client-coordinated distributed transaction on the storage, >> > > and makes non-ACID distributed database/storage ACID-compliant. >> > > And Cassandra is the first supported database implementation. >> > > >> > > It's been internally tested intensively and is jepsen-passed. >> > > (see jepsen directory for more detail) >> > > If you are looking for ACID transaction capability on top of >> cassandra, >> > > Please take a look and give us a feedback or contribution. >> > > >> > > Best regards, >> > > Hiroyuki Yamada >> > > >> > > - >> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> > > For additional commands, e-mail: user-h...@cassandra.apache.org >> > > >> > >> > ----- >> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> > For additional commands, e-mail: user-h...@cassandra.apache.org >> > >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> > > -- > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced > -- Jonathan Ellis co-founder, http://www.datastax.com @spyced
Re: Released an ACID-compliant transaction library on top of Cassandra
It looks like it's based on this: http://www.vldb.org/pvldb/vol6/p1434-dey.pdf On Tue, Oct 16, 2018 at 11:37 AM Ariel Weisberg wrote: > Hi, > > Yes this does sound great. Does this rely on Cassandra's internal SERIAL > consistency and CAS functionality or is that implemented at a higher level? > > Regards, > Ariel > > On Tue, Oct 16, 2018, at 12:31 PM, Jeff Jirsa wrote: > > This is great! > > > > -- > > Jeff Jirsa > > > > > > > On Oct 16, 2018, at 5:47 PM, Hiroyuki Yamada > wrote: > > > > > > Hi all, > > > > > > # Sorry, I accidentally emailed the following to dev@, so re-sending > to here. > > > > > > We have been working on ACID-compliant transaction library on top of > > > Cassandra called Scalar DB, > > > and are pleased to announce the release of v.1.0 RC version in open > source. > > > > > > https://github.com/scalar-labs/scalardb/ > > > > > > Scalar DB is a library that provides a distributed storage abstraction > > > and client-coordinated distributed transaction on the storage, > > > and makes non-ACID distributed database/storage ACID-compliant. > > > And Cassandra is the first supported database implementation. > > > > > > It's been internally tested intensively and is jepsen-passed. > > > (see jepsen directory for more detail) > > > If you are looking for ACID transaction capability on top of cassandra, > > > Please take a look and give us a feedback or contribution. > > > > > > Best regards, > > > Hiroyuki Yamada > > > > > > - > > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: user-h...@cassandra.apache.org > > > > > > > - > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: user-h...@cassandra.apache.org > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > > -- Jonathan Ellis co-founder, http://www.datastax.com @spyced
Reminder: don't listen on public addresses
MongoDB has been in the news for hackers deleting unsecured databases and demanding money to return the data. Now copycats are starting to look at other targets too like the thousands of unsecured Cassandra databases. Preventing this is very simple: don't allow Cassandra to listen on public interfaces. Of course additional security measures are useful as defense in depth, but bottom line if the bad guys can't connect to your cluster they can't harm it. -- Jonathan Ellis co-founder, http://www.datastax.com @spyced
Re: Disabling all caching in Cassandra
[Moving to users list] The most important thing will be to reduce your JVM heap size. Cassandra will automatically reduce pool sizes as you do that. Disabling key cache and row cache will help you get that even smaller. On Tue, Jun 21, 2016 at 5:21 AM, Sumit Anvekar <sumit.anve...@gmail.com> wrote: > Hello, > We are using Cassandra 3.0.7 version and off late we see that 90% of > memory is occupied even though hard-drive is hardly used. We have a cluster > of 5 nodes with 15 GB memory, 4 cores, 200 GB SSD. > > We tried all kind of configurations through both YAML as well as table > based properties but none seem to help. Memory usage constantly increases > almost in direct ratio of data. > > What we are trying to do is, utilize as less memory as possible and we are > okay with reduced read performance. Application is write intensive. To do > this, our idea was to disable all caches possible, to avoid keeping > anything not-necessary in memory. > > Find attached our yaml and table configuration. > > CREATE KEYSPACE if not exists test_ks WITH replication = {'class': > 'SimpleStrategy', 'replication_factor': '1'}; > CREATE TABLE if not exists test_ks.test_cf (id bigint PRIMARY KEY,key_str > text,value1 int,value2 int,update_ts bigint) WITH bloom_filter_fp_chance = > 1 AND comment = '' AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} AND compression = > {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = > 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND > gc_grace_seconds = 864000 AND max_index_interval = 10240 AND > memtable_flush_period_in_ms = 360 AND min_index_interval = 10240 AND > read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE' AND caching > = {'keys': 'NONE', 'rows_per_partition': 'NONE'}; > > Has anyone tried such configuration before? Please let us know if > disabling cache would help us in this situation. And if yes, how we can > disable cache completely. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Announcement: Thrift API deprecation
Thrift has been officially frozen for almost two years and unofficially for longer. Meanwhile, maintaining Thrift support through changes like 8099 has been a substantial investment. Thus, we are officially deprecating Thrift now and removing support in 4.0, i.e. Nov 2016 if tick-tock goes as planned. (I note that some users have been unable to completely migrate away from Thrift because CQL doesn’t quite provide feature parity. The last such outstanding issue is mixing static and dynamic Thrift “columns” in a single table. We have an issue open to address this [1] and should have it committed for 3.4. In the meantime, I thought it best to give people more notice rather than less.) [1] https://issues.apache.org/jira/browse/CASSANDRA-10857 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Cassandra users survey
Thanks for all the responses! The results (minus suggestions and emails) are available here: https://docs.google.com/spreadsheets/d/1FegCArZgj2DNAjNkcXi1n2Y1Kfvf6cdZedkMPYQdvC0/edit?usp=sharing I've included charts on separate sheets for each question, but unfortunately I couldn't figure out how to help Google make sense of any of the data where the form allowed multiple or free-form responses. Some things that jump out at me: - 3/4 of responses use only CQL. - 3% have more than 1000 tables in the schema. On an absolute scale this is low but still more than I expected. - 60% are deployed across more than one datacenter - I should have broken down the node count responses into more detail; roughly 50% each in 1-10 and 10-100. I should also include an "are you in production?" question next time. - More responses of both "less than 32 GB ram/node" and "128 GB or more" than I expected. - Including the "both" responses, a majority of users are deploying SSD now. On Wed, Sep 30, 2015 at 1:18 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > With 3.0 approaching, the Apache Cassandra team would appreciate your > feedback as we work on the project roadmap for future releases. > > I've put together a brief survey here: > https://docs.google.com/forms/d/1TEG0umQAmiH3RXjNYdzNrKoBCl1x7zurMroMzAFeG2Y/viewform?usp=send_form > > Please take a few minutes to fill it out! > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder, http://www.datastax.com > @spyced > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Cassandra users survey
I think what would be most useful would be to pick your largest cluster, and answer based on that. If you have multiple applications in the cluster, then the sum; otherwise, just one. On Thu, Oct 1, 2015 at 9:50 PM, Jim Ancona <j...@anconafamily.com> wrote: > Hi Jonathan, > > The survey asks about "your application." We have multiple applications > using Cassandra. Are you looking for information about each application > separately, or the sum of all of them? > > Jim > > On Wed, Sep 30, 2015 at 2:18 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > >> With 3.0 approaching, the Apache Cassandra team would appreciate your >> feedback as we work on the project roadmap for future releases. >> >> I've put together a brief survey here: >> https://docs.google.com/forms/d/1TEG0umQAmiH3RXjNYdzNrKoBCl1x7zurMroMzAFeG2Y/viewform?usp=send_form >> >> Please take a few minutes to fill it out! >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder, http://www.datastax.com >> @spyced >> >> > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Cassandra users survey
With 3.0 approaching, the Apache Cassandra team would appreciate your feedback as we work on the project roadmap for future releases. I've put together a brief survey here: https://docs.google.com/forms/d/1TEG0umQAmiH3RXjNYdzNrKoBCl1x7zurMroMzAFeG2Y/viewform?usp=send_form Please take a few minutes to fill it out! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Cassandra 2.2, 3.0, and beyond
3.1 is EOL as soon as 3.3 (the next bug fix release) comes out. On Thu, Jun 11, 2015 at 4:10 AM, Stefan Podkowinski stefan.podkowin...@1und1.de wrote: We are also extending our backwards compatibility policy to cover all 3.x releases: you will be able to upgrade seamlessly from 3.1 to 3.7, for instance, including cross-version repair. What will be the EOL policy for releases after 3.0? Given your example, will 3.1 still see bugfixes at this point when I decide to upgrade to 3.7? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Cassandra 2.2, 3.0, and beyond
As soon as 8099 is done. On Thu, Jun 11, 2015 at 11:53 AM, Pierre Devops pierredev...@gmail.com wrote: Hi, 3.x beta release date ? 2015-06-11 16:21 GMT+02:00 Jonathan Ellis jbel...@gmail.com: 3.1 is EOL as soon as 3.3 (the next bug fix release) comes out. On Thu, Jun 11, 2015 at 4:10 AM, Stefan Podkowinski stefan.podkowin...@1und1.de wrote: We are also extending our backwards compatibility policy to cover all 3.x releases: you will be able to upgrade seamlessly from 3.1 to 3.7, for instance, including cross-version repair. What will be the EOL policy for releases after 3.0? Given your example, will 3.1 still see bugfixes at this point when I decide to upgrade to 3.7? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Cassandra 2.2, 3.0, and beyond
We've started using the docs-impacting label https://issues.apache.org/jira/issues/?jql=labels%20%3D%20docs-impacting%20AND%20project%20%3D%20CASSANDRA to make it easier for the technical writers to keep up, but otherwise we're not planning any major changes. On Thu, Jun 11, 2015 at 4:50 AM, Daniel Compton daniel.compton.li...@gmail.com wrote: Hi Jonathan Does documentation fit into the new monthly releases and definition of done as well, or is that part of another process? I didn't see any mention of it in the docs, though I may have missed it. On Thu, 11 Jun 2015 at 9:10 pm Stefan Podkowinski stefan.podkowin...@1und1.de wrote: We are also extending our backwards compatibility policy to cover all 3.x releases: you will be able to upgrade seamlessly from 3.1 to 3.7, for instance, including cross-version repair. What will be the EOL policy for releases after 3.0? Given your example, will 3.1 still see bugfixes at this point when I decide to upgrade to 3.7? -- -- Daniel -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Cassandra 2.2, 3.0, and beyond
*As you know, we've split our post-2.1 release into two pieces, with 2.2 to be released in July (rc1 out Monday http://cassandra.apache.org/download/) and 3.0 in September.2.2 will include Windows support, commitlog compression https://issues.apache.org/jira/browse/CASSANDRA-6809, JSON support https://issues.apache.org/jira/browse/CASSANDRA-7970, role-based authorization http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra, bootstrap-aware leveled compaction https://issues.apache.org/jira/browse/CASSANDRA-7460, and user-defined functions http://christopher-batey.blogspot.com/2015/05/cassandra-aggregates-min-max-avg-group.html. 3.0 will include a major storage engine rewrite https://issues.apache.org/jira/browse/CASSANDRA-8099 and materialized views https://issues.apache.org/jira/browse/CASSANDRA-6477.We're splitting things up this way because we don't want to block the features that are already complete while waiting for 8099 (the new storage engine). Releasing them now as 2.2 reduces the risk for users (2.2 has a lot in common with 2.1) and allows us to stabilize that independently of the upheaval from 8099.After 3.0, we'll take this even further: we will release 3.x versions monthly. Even releases will include both bugfixes and new features; odd releases will be bugfix-only. You may have heard this referred to as tick-tock releases, after Intel's policy of changing process and architecture independently http://www.intel.com/content/www/us/en/silicon-innovations/intel-tick-tock-model-general.html.The primary goal is to improve release quality. Our current major dot zero releases require another five or six months to make them stable enough for production. This is directly related to how we pile features in for 9 to 12 months and release all at once. The interactions between the new features are complex and not always obvious. 2.1 was no exception, despite DataStax hiring a full time test engineering team specifically for Apache Cassandra.We need to try something different. Tick-tock releases will dramatically reduce the number of features in each version, which will necessarily improve our ability to quickly track down any regressions. And pausing every other month to focus on bug fixes will help ensure that we don't accumulate issues faster than we can fix them.Tick-tock will also prevent situations like the one we are in now with 8099 delaying everything else. Users will get to test new features almost immediately.To get there, we are investing significant effort in making trunk always releasable, with the goal that each release, or at least each odd-numbered bugfix release, should be usable in production. We’ve extended our continuous integration server to make it easy for contributors to run tests against feature branches http://www.datastax.com/dev/blog/cassandra-testing-improvements-for-developer-convenience-and-confidence before merging to trunk and we’re working on more test infrastructure https://docs.google.com/document/d/1Seku0vPwChbnH3uYYxon0UO-b6LDtSqluZiH--sWWi0 and procedures https://docs.google.com/document/d/1ptr47UQ56N80jqL_O6AlE67b0STyn_cVp2k5DTv-OMc to improve release quality. You can see how this is coming along in our May retrospective https://docs.google.com/document/d/1GtuYRocdr9luNdwmm8wE84uC5Wr6TvewFbQtqoAFVeU/edit.We are also extending our backwards compatibility policy to cover all 3.x releases: you will be able to upgrade seamlessly from 3.1 to 3.7, for instance, including cross-version repair. We will not introduce any extra upgrade requirements or remove deprecated features until 4.0, no sooner than a year after 3.0.Under normal conditions, we will not release 3.x.y stability releases for x 0. That is, we will have a traditional 3.0.y stability series, but the odd-numbered bugfix-only releases will fill that role for the tick-tock series -- recognizing that occasionally we will need to be flexible enough to release an emergency fix in the case of a critical bug or security vulnerability.We do recognize that it will take some time for tick-tock releases to deliver production-level stability, which is why we will continue to deliver 2.2.y and 3.0.y bugfix releases. (But if we do demonstrate that tick-tock can deliver the stability we want, there will be no need for a 4.0.y bugfix series, only 4.x tick-tock.) After 2.2.0 is released, 2.0 will reach end-of-life as planned. After 3.0.0 is released, 2.1 will also reach end of life. This is earlier than expected, but 2.2 will be very close to as stable as 2.1 and users will be well served by upgrading. We will maintain the 2.2 stability series until 4.0 is released, and 3.0 for six months after that.Thanks for reading this far, and I look forward to hearing how 2.2rc1 works for you!* -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
HSHA Thrift server corruption in Cassandra 2.0.0 - 2.0.5
The hsha (half-synchronous, half-asynchronous) Thrift server was rewritten on top of Disruptor for Cassandra 2.0 [1] to unlock substantial performance benefits over the old hsha. Unfortunately, the rewrite introduced a bug that can cause incorrect data to be sent from the coordinator to replicas. I apologize that it took so long for us to realize what was causing the compaction errors reported as far back as November. Who is affected: anyone running the hsha server in a 2.0.x release for x 6. Who is NOT affected: anyone using the native protocol or the default sync Thrift server. 2.0.6 has a fix and is expected to be released Monday; you can grab the pre-release build from [3], or apply the patch from [4] yourself. [1] https://issues.apache.org/jira/browse/CASSANDRA-5582 [2] https://issues.apache.org/jira/browse/CASSANDRA-6285 [3] http://people.apache.org/~slebresne/ [4] https://issues.apache.org/jira/secure/attachment/12632583/CASSANDRA-6285-disruptor-heap.patch -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: GCInspector GC for ConcurrentMarkSweep running every 15 seconds
Sounds like you have CMSInitiatingOccupancyFraction set close to 60. You can raise that and/or figure out how to use less heap. On Mon, Feb 17, 2014 at 5:06 PM, John Pyeatt john.pye...@singlewire.com wrote: I have a 6 node cluster running on AWS. We are using m1.large instances with heap size set to 3G. 5 of the 6 nodes seem quite healthy. The 6th one however is running GCInspector GC for ConcurrentMarkSweep every 15 seconds or so. There is nothing going on on this box. No repairs and almost not user activity. But the CPU is almost continuously at 50% or more. The only message in the log at all is the INFO 2014-02-17 22:58:53,429 [ScheduledTasks:1] GCInspector GC for ConcurrentMarkSweep: 213 ms for 1 collections, 1964940024 used; max is 3200253952 INFO 2014-02-17 22:59:07,431 [ScheduledTasks:1] GCInspector GC for ConcurrentMarkSweep: 250 ms for 1 collections, 1983269488 used; max is 3200253952 INFO 2014-02-17 22:59:21,522 [ScheduledTasks:1] GCInspector GC for ConcurrentMarkSweep: 280 ms for 1 collections, 1998214480 used; max is 3200253952 INFO 2014-02-17 22:59:36,527 [ScheduledTasks:1] GCInspector GC for ConcurrentMarkSweep: 305 ms for 1 collections, 2013065592 used; max is 3200253952 INFO 2014-02-17 22:59:50,529 [ScheduledTasks:1] GCInspector GC for ConcurrentMarkSweep: 334 ms for 1 collections, 2028069232 used; max is 3200253952 We don't see any of these messages on the other nodes in the cluster. We are seeing similar behaviour for both our production and QA clusters. Production is running cassandra 1.2.9 and QA is running 1.2.13. Here are some of the cassandra settings that I would think might be relevant. flush_largest_memtables_at: 0.75 reduce_cache_sizes_at: 0.85 reduce_cache_capacity_to: 0.6 in_memory_compaction_limit_in_mb: 64 Does anyone have any ideas why we are seeing this so selectively on one box? Any cures??? -- John Pyeatt Singlewire Software, LLC www.singlewire.com -- 608.661.1184 john.pye...@singlewire.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Introducing farsandra: A different way to integration test with c*
Nice work, Ed. Personally, I do find it more productive to write system tests in Python (dtest builds on ccm to provide a number of utilities that cut down on the bolierplate [1]), but I can understand that others will feel differently and more testing can only improve Cassandra. Thanks! [1] https://github.com/riptano/cassandra-dtest On Wed, Jan 22, 2014 at 7:06 AM, Edward Capriolo edlinuxg...@gmail.com wrote: The repo: https://github.com/edwardcapriolo/farsandra The code: Farsandra fs = new Farsandra(); fs.withVersion(2.0.4); fs.withCleanInstanceOnStart(true); fs.withInstanceName(1); fs.withCreateConfigurationFiles(true); fs.withHost(localhost); fs.withSeeds(Arrays.asList(localhost)); fs.start(); The story: For a while I have been developing applications that use Apache Cassandra as their data store. Personally I am more of an end-to-end test person then a mock test person. For years I have relied heavily on Hector's embedded cassandra to bring up Cassandra in a sane way inside a java project. The concept of Farsandra is to keep Cassandra close (in end to end tests and not mocked away) but keep your classpath closer (running cassandra embedded should be seamless and not mess with your client classpath). Recently there has been much fragmentation with Hector Asytanax, CQL, and multiple Cassandra releases. Bringing up an embedded test is much harder then it need be. Cassandra's core methods get, put, slice over thrift have been wire-compatible from version 0.7 - current. However Java libraries for thrift and things like guava differ across the Cassandra versions. This makes a large number of issues when trying to use your favourite client with your 1 or more versions of Cassandra. (sometimes a thrift mismatch kills the entire integration and you (CANT)! test anything. Farsandra is much like https://github.com/pcmanus/ccm in that it launches Cassandra instances remotely inside a sub-process. Farsandra is done in java not python, making it easier to use with java development. I will not go and say Farsandra solves all problems. in fact it has it's own challenges (building yaml configurations across versions, fetching binary cassandra from the internet), but it opens up new opportunities to developer complicated multi-node testing scenarios which are impossible due to re-entrant embedded cassandra code! Have fun. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Upgrading Cassandra from 1.2.11 to 2.0
You can't just drop in Apache Cassandra over DSE since it adds custom replication strategies like this one. On Thu, Nov 21, 2013 at 9:38 AM, Santosh Shet santosh.s...@vista-one-solutions.com wrote: Hi, We are facing problem while upgrading Cassandra which is available in the DSE 3.2 from version 1.2.11 to 2.0 . Below is the error log we are getting while starting Cassandra. java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Unable to find replication strategy class 'org.apache.cassandra.locator.EverywhereStrategy' at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:274) at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:289) at org.apache.cassandra.db.DefsTables.loadFromKeyspace(DefsTables.java:130) at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:508) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:237) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504) Caused by: org.apache.cassandra.exceptions.ConfigurationException: Unable to find replication strategy class 'org.apache.cassandra.locator.EverywhereStrategy' at org.apache.cassandra.utils.FBUtilities.classForName(FBUtilities.java:469) at org.apache.cassandra.locator.AbstractReplicationStrategy.getClass(AbstractReplicationStrategy.java:290) at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:266) ... 6 more Thanks, Santosh Shet Software Engineer | VistaOne Solutions Direct India : +91 80 30273829 | Mobile India : +91 8105720582 Skype : santushet -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Wiki popularity
We've started counting visits to the wiki pages so we can use that information to prioritize which pages to improve. Here's what that looks like, for the past ~24h: 1,431 wiki.apache.org/cassandra/GettingStarted 366 wiki.apache.org/cassandra/FAQ 284 wiki.apache.org/cassandra/Operations 238 wiki.apache.org/cassandra/FrontPage 209 wiki.apache.org/cassandra/HadoopSupport 209 wiki.apache.org/cassandra/NodeTool 206 wiki.apache.org/cassandra/DebianPackaging 168 wiki.apache.org/cassandra/CassandraCli 159 wiki.apache.org/cassandra/ArchitectureOverview 149 wiki.apache.org/cassandra/ClientOptions 135 wiki.apache.org/cassandra/DataModel 117 wiki.apache.org/cassandra/API 90wiki.apache.org/cassandra/CassandraLimitations 85wiki.apache.org/cassandra/SecondaryIndexes 74wiki.apache.org/cassandra/StorageConfiguration 71wiki.apache.org/cassandra/MemtableSSTable 66wiki.apache.org/cassandra/Administration%20Tools 61wiki.apache.org/cassandra/RunningCassandra (GettingStarted is by far the most viewed, which is not surprising since it's linked from the cassanra.a.o front page.) If you'd like to help improve any of these, and aren't already on the wiki contributors whitelist, please contact me. We had to add the whitelist to stop spam. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Failure detection false positives and vnodes
We've been working on tracking down the causes of nodes in the cluster incorrectly marking other, healthy nodes down. We've identified three scenarios. The first two deal with the Gossip thread blocking while processing a state change, preventing subsequent heartbeats from being processed: 1. Write activity + cluster membership changes (CASSANDRA-6297). The Gossip stage would block while flushing system.peers, which could get backed up flushes of user tables. By default, there is one flush thread per configured data directory. (Thus, increasing memtable_flush_writers in cassandra.yaml can be an effective workaround, especially if you are on SSDs where the increased contention will be low.) 2. Cluster membership changes with many keyspaces configured (CASSANDRA-6244). Computing the ranges to be transferred between nodes is linear with respect to the number of keyspaces (since that is where replication options are configured). I suspect that enabling vnodes will exacerbate this as well. We're still analyzing the third: 3. Large (hundreds to thousands of node) clusters with vnodes enabled show FD false positives even without cluster membership changes (CASSANDRA-6127). Fixes for (1) and (2) are committed and will be in 1.2.12 and 2.0.3. We can reproduce (3) and hope to have a resolution soon. In the meantime, caution is advised when deploying vnode-enabled clusters, since other pressures on the system could make this a problem with smaller clusters as well. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: CQL Thrift
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows On Fri, Aug 30, 2013 at 12:53 PM, Peter Lin wool...@gmail.com wrote: my bias perspective, I find the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text, last_name text, emailid text); and create index as: create index on user(first_name); then inserted some data as: insert into user(user_id,first_name,last_name,emailId) values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in'); Then if update same column family using Cassandra-cli as: update column family user with key_validation_class='UTF8Type' and column_metadata=[{column_name:last_name, validation_class:'UTF8Type', index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type', index_type:KEYS}]; Now if i connect via cqlsh and explore user table, i can see column first_name,last_name are not part of table structure anymore. Here is the output: CREATE TABLE user ( key text PRIMARY KEY ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; cqlsh:cql3usage select * from user; user_id - @mevivs I understand that, CQL3 and thrift interoperability is an issue. But this looks to me a very basic scenario. Any suggestions? Or If anybody can explain a reason behind this? -Vivek -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Dynamic Columns Question Cassandra 1.2.5, Datastax Java Driver 1.0
This is becoming something of a FAQ, so I wrote an more in-depth answer: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows On Thu, Jun 6, 2013 at 8:02 AM, Joe Greenawalt joe.greenaw...@gmail.com wrote: Hi, I'm having some problems figuring out how to append a dynamic column on a column family using the datastax java driver 1.0 and CQL3 on Cassandra 1.2.5. Below is what i'm trying: cqlsh:simplex create table user (firstname text primary key, lastname text); cqlsh:simplex insert into user (firstname, lastname) values ('joe','shmoe'); cqlsh:simplex select * from user; firstname | lastname ---+-- joe |shmoe cqlsh:simplex insert into user (firstname, lastname, middlename) values ('joe','shmoe','lester'); Bad Request: Unknown identifier middlename cqlsh:simplex insert into user (firstname, lastname, middlename) values ('john','shmoe','lester'); Bad Request: Unknown identifier middlename I'm assuming you can do this based on previous based thrift based clients like pycassa, and also by reading this: The Cassandra data model is a dynamic schema, column-oriented data model. This means that, unlike a relational database, you do not need to model all of the columns required by your application up front, as each row is not required to have the same set of columns. Columns and their metadata can be added by your application as they are needed without incurring downtime to your application. here: http://www.datastax.com/docs/1.2/ddl/index Is it a limitation of CQL3 and its connection vs. thrift? Or more likely i'm just doing something wrong? Thanks, Joe -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Cassandra performance decreases drastically with increase in data size.
Sounds like you're spending all your time in GC, which you can verify by checking what GCInspector and StatusLogger say in the log. Fix is increase your heap size or upgrade to 1.2: http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 On Wed, May 29, 2013 at 11:32 PM, srmore comom...@gmail.com wrote: Hello, I am observing that my performance is drastically decreasing when my data size grows. I have a 3 node cluster with 64 GB of ram and my data size is around 400GB on all the nodes. I also see that when I re-start Cassandra the performance goes back to normal and then again starts decreasing after some time. Some hunting landed me to this page http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks about the large data sets and explains that it might be because I am going through multiple layers of OS cache, but does not tell me how to tune it. So, my question is, are there any optimizations that I can do to handle these large datatasets ? and why does my performance go back to normal when I restart Cassandra ? Thanks ! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Using CQL to insert a column to a row dynamically
On Mon, May 27, 2013 at 9:28 AM, Matthew Hillsborough matthew.hillsboro...@gmail.com wrote: I am trying to understand some fundamentals in Cassandra, I was under the impression that one of the advantages a developer can take in designing a data model is by dynamically adding columns to a row identified by a key. That means I can model my data so that if it makes sense, a key can be something such as a user_id from a relational database, and I can for example, create arbitrary amounts of columns that relate to that user. Fundamentally? No. Experience has shown that having schema to say email column is text, and birth date column is a timestamp is very useful as projects and teams grow. That said, if you really don't know what kinds of attributes might apply (generally because they are user-generated) you can use a Map. Wouldn't this type of model make more sense to just stuff into a relational database? There's nothing wrong with the relational model per se (subject to the usual explanation about needing to denormalize to scale). Cassandra is about making applications scale, not throwing the SQL baby out with the bathwater for the sake of being different. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Commit Log Magic
Sstables must be sorted by token, or we can't compact efficiently. Since writes usually do not arrive in token order, we stage them first in a memtable. (cc user@) On Thu, May 23, 2013 at 8:44 AM, Ansar Rafique ansa...@hotmail.com wrote: Hi Jonathan, I am Ansar Rafique and I asked you few questions 2 week ago about Cassandra Implementation. I was watching your presentation where you suggested the page below. http://nosql.mypopescu.com/post/27684111441/cassandra-and-solid-state-drives I have a question and I have tried to find the answer but didn't really get satisfactory response yet. My question is why Cassandra using Commit log for durability instead direct write to SSTable. Cassandra acheives high write throughput because it stores data first in memtable and then flush into disk. Sounds good but remeber Cassandra also write in commit log for durability. I made it sure and it's written that write to memetable and commit log is synchronous which means it will write first in commit log and wait until it complete and will start writing in memtable or vice versa. Writing transaction to commit log requires an I/O operation which means for each insert we need an I/O :( for writing data in commit log and later requires more I/O's to flush data again on disk. Isn't writing to commit log is overhead ? Isn't it better to directly write data on disk instead of commit log ? Remember I/O operations are expensive and reduction in I/O's mean improvement in performance. If we look at RDBMS, it stores data in commit log as well as disk. Fair enough but if we don't insert data in commit log. It's performance should be the same as Cassandra because it perform I/O to insert data on disk and Cassandra also perform's I/O to insert data on commit log. Is commit log is less expensive ? I didn't really understood the magic :) Would you like to elaborate it more ? Thank you in advance for your time. Looking to hear from you. Regards, Ansar Rafique -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: CQL3 question
using GET or LIST from the cli will do what you want it's a bad idea to have One Big Partition, since partitions by nature are not spread across multiple machines. in general you'll want to keep partitions under ~ 1M cells or ~100K CQL3 rows. On Sun, May 12, 2013 at 12:53 AM, Sam Mandes eng.salaman...@gmail.com wrote: Hello Jonathan, I read your blog post:http://www.datastax.com/dev/blog/cql3-for-cassandra-experts and enjoyed it so much. I am new to the NoSQL world, I came from the SQL world. I noticed that Cassandra is pushing CQL3 more, it's even recommended to use CQL3 for new projects instead of the Thrift API. I believe Cassandra is going to drop Thrift one day. I know that one can use the compact storage to allow backward compatibility. And I know that CQL3 uses the new binary protocol instead of Thrift now. I believe they both use the same storage engine. (I still do not understand why they are incompatible!) Thus, I was wondering is there is a possible way that I can view the tables created with CQL3 in a lower-level view like I used with Thrift? I mean I can view the tables as simply CFs, as how rows are exactly stored, just something to expose the internal representation? I've another question, when using compact storage and creating a table with composite primary key, Cassandra uses a single rows with multiple columns but if I've lots of items and the columns limit is 20 billion, how can this be avoided. I do not understand how CQL3 unpacking helps in this situation? Sorry for any inconvenience :) Thanks a lot, Sam -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Ten talks you shouldn’t miss at the Cassandra Summit
The Cassandra Summit is just over a month away! I wrote up my thoughts on the talks I'm most excited for here: http://www.datastax.com/dev/blog/ten-talks-you-shouldnt-miss-at-the-cassandra-summit Don't forget to register with the code SFSummit25 for a 25% discount: http://datastax.regsvc.com/E2 (Want to go, but your company won't pay? Let me know off-list and I'll see what I can do.) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Backup and restore between different node-sized clusters
You want to use sstableloader when the cluster sizes are different; it will stream things to the right places in the new one. On Wed, May 8, 2013 at 6:03 PM, Ron Siemens rsiem...@greatergood.com wrote: I have a 3-node cluster in production and a single-node development cluster. I tested snapshotting a column family from the 3-node production cluster, grouping the files together, and restoring onto my single node development system. That worked fine. Can I go the other direction? It's not easy for me to test in that direction: I'll get the chance at some point but would like to hear if you've done this. If I just put the snapshot from the single node cluster on one of the nodes from the 3-node cluster, and do a JMX loadNewSSTables on that node, will the data load correctly into the 3-nodes? Or is something more complex involved? FYI, I'm following the instructions below, but only doing per column family backup and restore. http://www.datastax.com/docs/1.2/operations/backup_restore Thanks, Ron -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: index_interval
index_interval won't be going away, but you won't need to change it as often in 2.0: https://issues.apache.org/jira/browse/CASSANDRA-5521 On Mon, May 6, 2013 at 12:27 PM, Hiller, Dean dean.hil...@nrel.gov wrote: I heard a rumor that index_interval is going away? What is the replacement for this? (we have been having to play with this setting a lot lately as too big and it gets slow yet too small and cassandra uses way too much RAM…we are still trying to find the right balance with this setting). Thanks, Dean -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Cassandra NYC event followup
The videos from the NYC* Big Data Tech Day are all up. I blogged about my favorites here: http://www.datastax.com/dev/blog/my-top-five-talks-from-nyc-big-data-tech-day Good to meet the NYC community again! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: ordered partitioner
Not in general, no. There are places, like indexing, that need to use a local partitioner rather than the global one. Which uses of the DK constructor looked erroneous to you? On Mon, Apr 22, 2013 at 10:54 AM, Desimpel, Ignace ignace.desim...@nuance.com wrote: Hi, I was trying to implement my own ordered partitioner and got into problems. The current DecoratedKey is using a ByteBufferUtil.compareUnsigned for comparing the key. I was thinking of having a signed comparison, so I thought of making my own DecoratedKey, Token and Partitioner. That way I would have complete control… So made a partitioner whith a function decorateKey(…) returning MyDecoratedKey in stead of DecoratedKey But when making my own MyDecoratedKey, the database get into trouble when adding a key space due to the fact that some code in Cassandra is using the ‘new DecoratedKey(…)’ statement and is not using the partitioner function decorateKey(…). Would it be logical to always call the partitioner function decorateKey such that the creation of an own partitioner and key decoration is possible? Ignace Desimpel -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Cassandra Summit 2013
Hi all, Last year's Summit saw fantastic talks [1] and over 800 attendees. The feedback was enthusiastic; the most commonly requested improvement was to extend it to two days. We're pleased to deliver just that for 2013! This year's Cassandra Summit will be at Fort Mason in San Francisco, California from June 11th - 12th, with 45+ sessions covering Cassandra use cases, development tips and tricks, war stories, how-tos, and more. The popular meet the experts room will also return. Engineers and committers from companies such as Spotify, eBay, Netflix, Comcast, BlueMountain Capital, and DataStax will be there excited to share their Cassandra experiences. The schedule of talks is about 90% final. To view it and register, visit http://www.datastax.com/company/news-and-events/events/cassandrasummit2013 and use the code SFSummit25 for 25% off. See you there! [1] http://www.datastax.com/company/news-and-events/events/cassandrasummit2012/presentations -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Vnodes - HUNDRED of MapReduce jobs
I still don't see the hole in the following reasoning: - Input splits are 64k by default. At this size, map processing time dominates job creation. - Therefore, if job creation time dominates, you have a toy data set ( 64K * 256 vnodes = 16 MB) Adding complexity to our inputformat to improve performance for this niche does not sound like a good idea to me. On Thu, Mar 28, 2013 at 8:40 AM, cem cayiro...@gmail.com wrote: Hi Alicia , Cassandra input format creates mappers as many as vnodes. It is a known issue. You need to lower the number of vnodes :( I have a simple solution for that and ready to write a patch. Should I create a ticket about that? I don't know the procedure about that. Regards, Cem On Thu, Mar 28, 2013 at 2:30 PM, Alicia Leong lccali...@gmail.com wrote: Hi All, I have 3 nodes of Cassandra 1.2.3 edited the cassandra.yaml for vnodes. When I execute a M/R job .. the console showed HUNDRED of Map tasks. May I know, is the normal since is vnodes? If yes, this have slow the M/R job to finish/complete. Thanks -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Vnodes - HUNDRED of MapReduce jobs
My point is that if you have over 16MB of data per node, you're going to get thousands of map tasks (that is: hundreds per node) with or without vnodes. On Fri, Mar 29, 2013 at 9:42 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Every map reduce task typically has a minimum Xmx of 256MB memory. See mapred.child.java.opts... So if you have a 10 node cluster with 256 vnodes... You will need to spawn 2,560 map tasks to complete a job. And a 10 node hadoop cluster with 5 map slotes a node... You have 50 map slots. Wouldnt it be better if the input format spawned 10 map tasks instead of 2,560? On Fri, Mar 29, 2013 at 10:28 AM, Jonathan Ellis jbel...@gmail.com wrote: I still don't see the hole in the following reasoning: - Input splits are 64k by default. At this size, map processing time dominates job creation. - Therefore, if job creation time dominates, you have a toy data set ( 64K * 256 vnodes = 16 MB) Adding complexity to our inputformat to improve performance for this niche does not sound like a good idea to me. On Thu, Mar 28, 2013 at 8:40 AM, cem cayiro...@gmail.com wrote: Hi Alicia , Cassandra input format creates mappers as many as vnodes. It is a known issue. You need to lower the number of vnodes :( I have a simple solution for that and ready to write a patch. Should I create a ticket about that? I don't know the procedure about that. Regards, Cem On Thu, Mar 28, 2013 at 2:30 PM, Alicia Leong lccali...@gmail.com wrote: Hi All, I have 3 nodes of Cassandra 1.2.3 edited the cassandra.yaml for vnodes. When I execute a M/R job .. the console showed HUNDRED of Map tasks. May I know, is the normal since is vnodes? If yes, this have slow the M/R job to finish/complete. Thanks -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: virtual nodes + map reduce = too many mappers
Wouldn't you have more than 256 splits anyway, given a normal amount of data? (Default split size is 64k rows.) On Fri, Feb 15, 2013 at 7:01 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Seems like the hadoop Input format should combine the splits that are on the same node into the same map task, like Hadoop's CombinedInputFormat can. I am not sure who recommends vnodes as the default, because this is now the second problem (that I know of) of this class where vnodes has extra overhead, https://issues.apache.org/jira/browse/CASSANDRA-5161 This seems to be the standard operating practice in c* now, enable things in the default configuration like new partitioners and newer features like vnodes, even though they are not heavily tested in the wild or well understood, then deal with fallout. On Fri, Feb 15, 2013 at 11:52 AM, cem cayiro...@gmail.com wrote: Hi All, I have just started to use virtual nodes. I set the number of nodes to 256 as recommended. The problem that I have is when I run a mapreduce job it creates node * 256 mappers. It creates node * 256 splits. this effects the performance since the range queries have a lot of overhead. Any suggestion to improve the performance? It seems like I need to lower the number of virtual nodes. Best Regards, Cem -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: [RELEASE] Apache Cassandra 1.2 released
I'm presenting a webinar on what's new in 1.2 this Wednesday: http://learn.datastax.com/WebinarWhatsNewin1.2_Registration.html See you there! On Wed, Jan 2, 2013 at 9:00 AM, Sylvain Lebresne sylv...@datastax.com wrote: The Cassandra team wishes you a very happy new year 2013, and is very pleased to announce the release of Apache Cassandra version 1.2.0. Cassandra 1.2.0 is a new major release for the Apache Cassandra distributed database. This version adds numerous improvements[1,2] including (but not restricted to): - Virtual nodes[4] - The final version of CQL3 (featuring many improvements) - Atomic batches[5] - Request tracing[6] - Numerous performance improvements[7] - A new binary protocol for CQL3[8] - Improved configuration options[9] - And much more... Please make sure to carefully read the release notes[2] before upgrading. Both source and binary distributions of Cassandra 1.2.0 can be downloaded at: http://cassandra.apache.org/download/ Or you can use the debian package available from the project APT repository[3] (you will need to use the 12x series). The Cassandra Team [1]: http://goo.gl/JmKp3 (CHANGES.txt) [2]: http://goo.gl/47bFz (NEWS.txt) [3]: http://wiki.apache.org/cassandra/DebianPackaging [4]: http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 [5]: http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2 [6]: http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 [7]: http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 [8]: http://www.datastax.com/dev/blog/binary-protocol [9]: http://www.datastax.com/dev/blog/configuration-changes-in-cassandra-1-2 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Read during digest mismatch
Correct. Which is one reason there is a separate setting for cross-datacenter read repair, by the way. On Thu, Nov 8, 2012 at 4:43 PM, sankalp kohli kohlisank...@gmail.com wrote: Hi, Lets say I am reading with consistency TWO and my replication is 3. The read is eligible for global read repair. It will send a request to get data from one node and a digest request to two. If there is a digest mismatch, what I am reading from the code looks like it will get the data from all three nodes and do a resolve of the data before returning to the client. Is it correct or I am readind the code wrong? Also if this is correct, look like if the third node is in other DC, the read will slow down even when the consistency was TWO? Thanks, Sankalp -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Hinted Handoff runs every ten minutes
How many hint sstables are there? What does sstable2json show? On Thu, Nov 8, 2012 at 3:23 PM, Mike Heffner m...@librato.com wrote: Is there a ticket open for this for 1.1.6? We also noticed this after upgrading from 1.1.3 to 1.1.6. Every node runs a 0 row hinted handoff every 10 minutes. N-1 nodes hint to the same node, while that node hints to another node. On Tue, Oct 30, 2012 at 1:35 PM, Vegard Berget p...@fantasista.no wrote: Hi, I have the exact same problem with 1.1.6. HintsColumnFamily consists of one row (Rowkey 00, nothing more). The problem started after upgrading from 1.1.4 to 1.1.6. Every ten minutes HintedHandoffManager starts and finishes after sending 0 rows. .vegard, - Original Message - From: user@cassandra.apache.org To: user@cassandra.apache.org Cc: Sent: Mon, 29 Oct 2012 23:45:30 +0100 Subject: Re: Hinted Handoff runs every ten minutes Dne 29.10.2012 23:24, Stephen Pierce napsal(a): I'm running 1.1.5; the bug says it's fixed in 1.0.9/1.1.0. How can I check to see why it keeps running HintedHandoff? you have tombstone is system.HintsColumnFamily use list command in cassandra-cli to check -- Mike Heffner m...@librato.com Librato, Inc. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Fw: Fwd: Compound primary key: Insert after delete
Mixing the two isn't really recommended because of just this kind of difficulty, but if you must, I would develop against 1.2 since it will actually validate that the CT encoding you've done manually is valid. 1.1 will just fail silently. On Mon, Oct 22, 2012 at 6:57 AM, Vivek Mishra vivek.mis...@yahoo.com wrote: Hi, I am building support for Composite/Compund keys in Kundera and currently getting into number of problems for my POC to access it via Thrift. I am planning to use thrift API for insert/update/delete and for query i will go by CQL way. Issues: CompositeTypeRunner.java (see attached): Simple program to perform CRUD, it is not inserting against the deleted row key and also thrift API is returning column name as Empty string. OtherCompositeTypeRunner.java (see attached): Program to demonstrate issue with compound primary key as boolean. Column family creation via CQL is working fine, But insert via thrift is giving issue with Unconfigured column family though it is there! This is what i have tried with cassandra 1.1.6 as well. Please have a look and share, if i am doing anything wrong? i did ask same on user group but no luck. -Vivek - Forwarded Message - From: Vivek Mishra mishra.v...@gmail.com To: vivek.mis...@yahoo.com Sent: Monday, October 22, 2012 5:17 PM Subject: Fwd: Compound primary key: Insert after delete -- Forwarded message -- From: Vivek Mishra mishra.v...@gmail.com Date: Mon, Oct 22, 2012 at 1:08 PM Subject: Re: Compound primary key: Insert after delete To: user@cassandra.apache.org Well. Last 2 lines of code are deleting 1 record and inserting 2 records, first one is the deleted one and a new record. Output from command line: [default@unknown] use bigdata; Authenticated to keyspace: bigdata [default@bigdata] list test1; Using default limit of 100 Using default column limit of 100 --- RowKey: 2 = (column=3:address, value=4, timestamp=1350884575938) --- RowKey: 1 2 Rows Returned. -Vivek On Mon, Oct 22, 2012 at 1:01 PM, aaron morton aa...@thelastpickle.com wrote: How is it not working ? Can you replicate the problem withe the CLI ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/10/2012, at 7:17 PM, Vivek Mishra mishra.v...@gmail.com wrote: code attached. Somehow it is not working with 1.1.5. -Vivek On Mon, Oct 22, 2012 at 5:20 AM, aaron morton aa...@thelastpickle.com wrote: Yes AFAIK. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/10/2012, at 12:15 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, Is it possible to reuse same compound primary key after delete? I guess it works fine for non composite keys. -Vivek CompositeTypeRunner.java -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: potential data loss in Cassandra 1.1.0 .. 1.1.4
On Thu, Oct 18, 2012 at 7:30 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi Jonathan. We are currently running the datastax AMI on amazon. Cassandra is in version 1.1.2. I guess that the datastax repo (deb http://debian.datastax.com/community stable main) will be updated directly in 1.1.6 ? Yes. Could you ask your team to add this specific warning in your documentation like here : http://www.datastax.com/docs/1.1/install/expand_ami (we use to update to last stable release before expand) or here : http://www.datastax.com/docs/1.1/install/upgrading or in any other place where this could be useful ? Good idea, I'll get that noted. Thanks! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
potential data loss in Cassandra 1.1.0 .. 1.1.4
I wanted to call out a particularly important bug for those who aren't in the habit of reading CHANGES. Summary: the bug was fixed in 1.1.5, with an follow-on fix for 1.1.6 that only affects users of 1.1.0 .. 1.1.4. Thus, if you upgraded from 1.0.x or earlier directly to 1.1.5, you're okay as far as this is concerned. But if you used an earlier 1.1 release, you should upgrade to 1.1.6. Explanation: A rewrite of the commitlog code for 1.1.0 used Java's nanotime api to generate commitlog segment IDs. This could cause data loss in the event of a power failure, since we assume commitlog IDs are strictly increasing in our replay logic. Simplified, the replay logic looks like this: 1. Take the most recent flush time X for each columnfamily 2. Replay all activity in the commitlog that occurred after X The problem is that nanotime gets effectively a new random seed after a reboot. If the new seed is substantially below the old one, any new commitlog segments will never be after the pre-reboot flush timestamps. Subsequently, restarting Cassandra will not replay any unflushed updates. We fixed the nanotime problem in 1.1.5 (CASSANDRA-4601). But, we didn't realize the implications for replay timestamps until later (CASSANDRA-4782). To fix these retroactively, 1.1.6 sets the flush time of pre-1.1.6 sstables to zero. Thus, the first startup of 1.1.6 will result in replaying the entire commitlog, including data that may have already been flushed. Replaying already-flushed data a second time is harmless -- except for counters. So, to avoid replaying flushed counter data, we recommend performing drain when shutting down the pre-1.1.6 C* prior to upgrade. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Any way to put a hard limit on memory cap for Cassandra ?
There are three places that Cassandra will use non-heap memory: One is JVM overhead like permgen. This is a normal part of running Java-based services and will be very stable and predictable. Another is the off-heap row cache. By default no row caching is done, you have to explicitly enable it per-columnfamily. You can also control the maximum cache size in cassandra.yaml. Finally, Cassandra mmap's all its data files by default. This is a frequent source of misunderstanding, because mmaping doesn't mean the memory is used in the normal sense, just that it's mapped into Cassandra's address space so it can be read most efficiently. See http://wiki.apache.org/cassandra/FAQ#mmap for more details. Note that only the JVM memory itself (heap + overhead) is locked by JNA. Disabling JNA will only expose you to a very bad experience should the OS decide to swap out part of the JVM. Best practice of course is to disable swap entirely, but JNA is there as a fall back because many people do not do this correctly. Directing followups to the Cassandra user mailing list. On Wed, Oct 3, 2012 at 3:33 AM, Thomas Yu t...@ruckuswireless.com wrote: Hi Jonathan, I'd tried to find any information regarding how I can put a hard limit on real memory usage by the Cassandra process, and would appreciate any pointers from you in this front. I'm using Cassandra 1.0.11, and had been using the ms and mx JVM options to try to limit the heap usage to 750M memory. However, i find that the actual usage of the Cassandra process is around 1G, and I understand that is related to the JNA, and locked memory (likely rooted from the PermGen) in mmap. However, what I really want to understand is that if there's any way I can put a hard limit on the real memory usage of Cassandra ?? Do I have to disable JNA in order to achieve that ?? Or otherwise, can I fairly estimate that the PermGen shall be pretty stable such that I can be fairly expect it won't exceed too much out of the 250M that I observed in the behavior of my application ? What about later releases of Cassandra (e.g. 1.1, or 1.2) ? Is there any option to help on this front ? Thanks in advance for any pointers that you can provide to help me understand this issue. Best Regards, -Thomas -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: JVM 7, Cass 1.1.1 and G1 garbage collector
Relatedly, I'd love to learn how to reliably reproduce full GC pauses on C* 1.1+. On Mon, Sep 10, 2012 at 12:37 PM, Oleg Dulin oleg.du...@gmail.com wrote: I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7. It is my feeble attempt to reduce Full GC pauses. Has anyone had any experience with this ? Anyone tried it ? -- Regards, Oleg Dulin NYC Java Big Data Engineer http://www.olegdulin.com/ -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Dynamic Column Families in CQLSH v3
To elaborate, we don't know yet how to expose DCT in CQL3. If you can give more background on what you're using DCT for, that would help. (If we're lucky, it's also possible that you don't actually need DCT -- Collections in 1.2 is done entirely with classic CT under the hood.) On Mon, Aug 27, 2012 at 5:56 PM, aaron morton aa...@thelastpickle.com wrote: It's not possible to have Dynamic Columns in CQL 3. The CF definition must specify the column names you expect to store. The COMPACT STORAGE (http://www.datastax.com/docs/1.1/references/cql/CREATE_COLUMNFAMILY) clause of the Create CF statement means can have column names that are part dynamic part static. But if you want to have CF's where the app code controls the column names you need to create the CF using the CLI and stick with the Thrift API. (because SELECT in CQL 3 does not support arbitrary column slicing.) Background http://www.mail-archive.com/user@cassandra.apache.org/msg23636.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 2:24 PM, Erik Onnen eon...@gmail.com wrote: Hello All, Attempting to create what the Datastax 1.1 documentation calls a Dynamic Column Family (http://www.datastax.com/docs/1.1/ddl/column_family#dynamic-column-families) via CQLSH. This works in v2 of the shell: create table data ( key varchar PRIMARY KEY) WITH comparator=LongType; When defined this way via v2 shell, I can successfully switch to v3 shell and query the CF fine. The same syntax in v3 yields: Bad Request: comparator is not a valid keyword argument for CREATE TABLE The 1.1 documentation indicates that comparator is a valid option for at least ALTER TABLE: http://www.datastax.com/docs/1.1/configuration/storage_configuration#comparator This leads me to believe that the correct way to create a dynamic column family is to create a table with no named columns and alter the table later but that also does not work: create table data (key varchar PRIMARY KEY); yields: Bad Request: No definition found that is not part of the PRIMARY KEY So, my question is, how do I create a Dynamic Column Family via the CQLSH v3? Thanks! -erik -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Problem while configuring key and row cache?
setcachecapacity is obsolete in 1.1+. Looks like we missed removing it from nodetool. See http://www.datastax.com/dev/blog/caching-in-cassandra-1-1 for background. (Moving to users@.) On Tue, Aug 21, 2012 at 8:19 AM, Amit Handa amithand...@gmail.com wrote: I started exploring apache cassandra 1.1.3. I am facing problem with how to improve performance of cassandra using caching configurations. I tried setting following configurations: ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 25 0 ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 0 25 ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 25 25 ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 444 444 But when i am checking that this particula configuration are really been configured using command: ./nodetool -h 107.108.189.212 cfstats it's showing following results for keySpace DemoUser and column Family Users: *Keyspace: DemoUser Read Count: 21914 Read Latency: 0.08268495026010769 ms. Write Count: 87656 Write Latency: 0.06009481381765082 ms. Pending Tasks: 0 Column Family: Users SSTable count: 1 Space used (live): 1573335 Space used (total): 1573335 Number of Keys (estimate): 22016 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 1 Read Count: 21914 Read Latency: 0.083 ms. Write Count: 87656 Write Latency: 0.060 ms. Pending Tasks: 0 Bloom Filter False Postives: 0 Bloom Filter False Ratio: 0.0 Bloom Filter Space Used: 41104 Compacted row minimum size: 150 Compacted row maximum size: 179 Compacted row mean size: 179 * I am unable to see the effect of above setcachecapacity command. Let me know how i can configure the cache capacity, and check it's effect. With Regards, Amit -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: RE Restore snapshot
Yes. On Thu, Aug 2, 2012 at 5:33 AM, Radim Kolar h...@filez.com wrote: 1) I assume that I have to call the loadNewSSTables() on each node? this is same as nodetool refresh? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Fwd: Call for Papers for ApacheCon Europe 2012 now open!
There are Big Data and NoSQL tracks where Cassandra talks would be appropriate. -- Forwarded message -- From: Nick Burch nick.bu...@alfresco.com Date: Thu, Jul 19, 2012 at 1:14 PM Subject: Call for Papers for ApacheCon Europe 2012 now open! To: committ...@apache.org Hi All We're pleased to announce that the Call for Papers for ApacheCon Europe 2012 is finally open! (For those who don't already know, ApacheCon Europe will be taking place between the 5th and the 9th of November this year, in Sinsheim, Germany.) If you'd like to submit a talk proposal, please visit the conference website at http://www.apachecon.eu/ and sign up for a new account. Once you've signed up, use your dashboard to enter your speaker bio, then submit your talk proposal(s). There's more information on the CFP page on the conference website. We welcome talk proposals from all projects, from right across the bredth of projects at the foundation! To make things easier for talk selection and scheduling, we'd ask that you tag your proposal with the track that it most closely fits within. The details of the tracks, and what projects they expect to cover, are available at http://www.apachecon.eu/tracks/. (If your project/group of projects was intending to submit a track, and missed the deadline, then please get in touch with us on apachecon-disc...@apache.org straight away, so we can work out if it's possible to squeeze you in...) The CFP will close on Friday 3rd August, so you've a little over weeks to send in your talk proposal. Don't put it off! We'll look forward to seeing some great ones shortly! Thanks Nick (On behalf of the Conferences committee) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Cassandra Summit 2012
Hi all, The 2012 Cassandra Summit will be in San Jose on August 8. The 2011 Summit sold out with almost 500 attendees; this year we found a bigger venue to accommodate 700+. It's fantastic to see the Cassandra community grow like this! The 2012 Summit will have *four* talk tracks, plus the popular Ask the Experts breakout room where DataStax engineers will take any question, all day. Accepted talks are posted at http://www.datastax.com/events/cassandrasummit2012#Sessions, and speaker bios at http://www.datastax.com/events/cassandrasummit2012#Speakers. More abstracts will be posted as they are confirmed. Learn more and register at http://www.datastax.com/events/cassandrasummit2012. Use the cassandra-list-20 code when registering and save 20%! P.S. Brandon Williams and I will be conducting a developer training course immediately before the Summit. More information at http://www.datastax.com/services/training -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
2012 Cassandra MVP nominations
DataStax would like to recognize individuals who go above and beyond in their contributions to Apache Cassandra. To formalize this a little bit, we're creating an MVP program, the first of which will be announced at the Cassandra summit [1] in August. To make this program a success, we need your help to nominate either yourself or another you think merits consideration. We're looking for people who take the initiative organizing user groups, who explain Cassandra in talks, blogs, Twitter, or other forums, or who answer questions on the mailing list, IRC, StackOverflow, etc. Please take five minutes and submit your nomination today at [2]. Nominations will be open throughout the next week. Those selected will be notified in advance. [1] http://www.datastax.com/events/cassandrasummit2012 [2] http://www.surveymonkey.com/s/WVBZGHR -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Java heap space on Cassandra start up version 1.0.10
] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:222) at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:204) at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:194) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:155) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Service exit with a return value of 100 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Memtable tuning in 1.0 and higher
I'm afraid not. It's too much change for an oldstable release series, and the bulk of the change is to AtomicSortedColumns which doesn't exist in 1.0, so even if we wanted to take a maybe it's okay if we release it first in 1.1.3 and then backport approach it wouldn't improve our safety margin since you'd basically need to rewrite the patch. On Sun, Jul 1, 2012 at 6:40 AM, Joost Van De Wijgerd jwijg...@gmail.com wrote: Hi Jonathan, Looks good, any chance of porting this fix to the 1.0 branch? Kind regards Joost Sent from my iPhone On 1 jul. 2012, at 09:25, Jonathan Ellis jbel...@gmail.com wrote: On Thu, Jun 28, 2012 at 1:39 PM, Joost van de Wijgerd jwijg...@gmail.com wrote: the currentThoughput is increased even before the data is merged into the memtable so it is actually measuring the throughput afaik. You're right. I've attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-4399 to fix this. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Failed to solve Digest mismatch
) removed index entry for cleaned-up value DecoratedKey(32, 3332):ColumnFamily(queue.idxPartitionId [7878323239537570657254616e67307878:true:4@1340870382109001,]) DEBUG [MutationStage:10] 2012-06-28 15:59:42,193 KeysIndex.java (line 103) removed index entry for cleaned-up value DecoratedKey(3898026790553046681950927403065, 31333430383730333531373839):ColumnFamily(queue.idxRecvTime [7878323239537570657254616e67307878:true:4@1340870382109003,]) DEBUG [MutationStage:10] 2012-06-28 15:59:42,193 KeysIndex.java (line 103) removed index entry for cleaned-up value DecoratedKey(3898026790552830793920833138736, 31333430383431363030303030):ColumnFamily(queue.idxRecvTimeRange [7878323239537570657254616e67307878:true:4@1340870382109010,]) DEBUG [MutationStage:10] 2012-06-28 15:59:42,193 KeysIndex.java (line 103) removed index entry for cleaned-up value DecoratedKey(test, 74657374):ColumnFamily(queue.idxServiceProvider [7878323239537570657254616e67307878:true:4@1340870382109007,]) DEBUG [MutationStage:10] 2012-06-28 15:59:42,193 RowMutationVerbHandler.java (line 56) RowMutation(keyspace='drc', key='7878323239537570657254616e67307878', modifications=[ColumnFamily(queue -deleted at 1340870382185000- [])]) applied. Sending response to 6553@/192.168.0.3 DEBUG [ReadStage:17] 2012-06-28 15:59:42,198 CollationController.java (line 77) collectTimeOrderedData DEBUG [ReadStage:17] 2012-06-28 15:59:42,199 ReadVerbHandler.java (line 58) Read key 7878323239537570657254616e67307878; sending response to 6556@/192.168.0.3 BRs //Ares -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Ball is rolling on High Performance Cassandra Cookbook second edition
On Wed, Jun 27, 2012 at 5:11 PM, Aaron Turner synfina...@gmail.com wrote: Honestly, I think using the same terms as a RDBMS does makes users think they're exactly the same thing and have the same properties... which is close enough in some cases, but dangerous in others. The point is that thinking in terms of the storage engine is difficult and unnecessary. You can represent that data relationally, which is the Right Thing to do both because people are familiar with that world and because it decouples model from representation, which lets us change the latter if necessary. http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: items removed from 1.1.0 cfstats output
They were removed because in 1.1 caches are global and not per-cf: http://www.datastax.com/dev/blog/caching-in-cassandra-1-1 On Fri, Jun 29, 2012 at 5:45 AM, Bill b...@dehora.net wrote: Were Key cache capacity: Key cache size: Key cache hit rate: Row cache: removed from cfstats in 1.1.0? I can see them in 1.0.8 but not 1.1.0. If so, was wondering why, as they're fairly useful :) Bill -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: upgrade issue
$ConstructMapping.constructJavaBean2ndStep(Constructor.java:240) ... 11 more null; Can't construct a java object for tag:yaml.org,2002:org. apache.cassandra.config.Config; exception=Cannot create property=commitlog_rotation_threshold_in_mb for JavaBean=org.apache. cassandra.config.Config@4dd36dfe; Unable to find property 'commitlog_rotation_threshold_in_mb' on class: org.apache.cassandra. config.Config Invalid yaml; unable to start server. See log for stacktrace. Thanks Regards Adeel Akbar -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Question on pending tasks in compaction manager
Pending compactions is just an estimate of how many compactions does Cassandra think it will take to get to fully-compacted state; there are no actual tasks enqueued anywhere. You could enable debug logging on org.apache.cassandra.db.compaction, and force a compaction with nodetool to see why no compactions happen when the estimate says there is still work to do. On Fri, Jun 29, 2012 at 4:27 AM, Martin McGovern martin.mcgov...@gmail.com wrote: Hi All, Could someone explain why the compaction manager stops compacting when it has a number of pending tasks? I have a test cluster that I am using to stress test IO throughput, i.e. find out what a safe load for our hardware is. Over a 16 hour period my node cluster completes approximately 49,000 tasks per node. After stopping my test compaction continues for a few minutes then stops. There are ~7,000 tasks still pending. No more tasks will be executed until I start another test and the 7000 pending will never be executed. I'm using leveled compaction with 5MB SS tables and my tests have a 50:50 read:write ratio. Each value is a 10K byte array with random content. Thanks, Martin -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: jscv CPU Consumption
Sounds like http://wiki.apache.org/cassandra/FAQ#ubuntu_ec2_hangs to me. On Fri, Jun 29, 2012 at 1:45 AM, Olivier Mallassi omalla...@octo.com wrote: Hi all We have a 12 servers clusters (8 cores by machines..). OS is Ubuntu 10.04.2. On one of the machine (only one) and without any load (no inserts, no reads), we have a huge CPU Load whereas there is no activities (no compaction in progress etc...) A top on the machine show us the process jscv is using all the available CPUs. Is that link to JNA? do you have any ideas? Cheers -- Olivier Mallassi OCTO Technology 50, Avenue des Champs-Elysées 75008 Paris Mobile: (33) 6 28 70 26 61 Tél: (33) 1 58 56 10 00 Fax: (33) 1 58 56 10 01 http://www.octo.com Octo Talks! http://blog.octo.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Memtable tuning in 1.0 and higher
On Thu, Jun 28, 2012 at 1:39 PM, Joost van de Wijgerd jwijg...@gmail.com wrote: the currentThoughput is increased even before the data is merged into the memtable so it is actually measuring the throughput afaik. You're right. I've attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-4399 to fix this. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Distinct with Cql
No. (Moving to user list.) On Jun 28, 2012 8:17 AM, Fábio Caldas fabio.cal...@gmail.com wrote: It´s possible to use distinct on cql? -- Atenciosamente, Fábio Caldas
Re: Memtable tuning in 1.0 and higher
[moving to user list] 1.0 doesn't care about throughput or op count anymore, only whether the total memory used by the *currrent* data in the memtables has reached the global limit. So, it automatically doesn't count historical data that's been overwritten in the current memtable. So, you may want to increase the memory allocated to memtables... or you may be seeing flushes forced by the commitlog size cap, which you can also adjust. But, the bottom line is I'd consider flushing every 5-6 minutes to be quite healthy; since the amount of time flushing : time not flushing ratio is quite small, reducing it further is going to give you negligible benefit (in exchange for longer replay times.) On Thu, Jun 28, 2012 at 5:09 AM, Joost van de Wijgerd jwijg...@gmail.com wrote: Hi, I work for eBuddy, We've been using Cassandra in production since 0.6 (using 0.7 and 1.0, skipped 0.8) and use it for several Use Cases. One of our uses is to persist our sessions. Some background, in our case sessions are long lived, we have a mobile messaging platform where sessions are essentially eternal. We use cassandra as a system of record for our session so in case of scale out or fail over we can quickly load the session state again. We use protocolbuffers to serailize our data into a byte buffer and then store this as a column value in a (wide) row. We use a partition based approach to scale and each partition has it's own row in cassandra. Each session is mapped to a partition and stored in a column in this row. Every time there is a change in the session (i.e. message add, acked etc) we schedule the session to be flushed to cassandra. Every x seconds we flush the dirty sessions. So there are a serious number of (over)writes going on and not that many reads (unless there is a failover situation or we scale out). This is using one of the strengths of cassandra. In versions 0.6 and 0.7 it was possible to control the memtable settings on a CF basis. So for this particular CF we would set the throughput really high since there are a huge number of overwrites. In the same cluster we have other CFs that have a different load pattern. Since we moved to version 1.0 however, it has become almost impossible to tune our system for this (mixed) workload. Since we now have only two knobs to turn (the size of the commit log and the total memtable size) and you have introduced the liveRation calculation. While this works ok for most workloads, our persistent session store is really hurt by the fact that the liveRatio cannot be lower than 1.0 We generally have an actual liveRatio of 0.025 on this CF due to the huge number of overwrites. We are now artificially tuning up the total memtable size but this interferes with our other CFs who have a different workload. Due to this, our performance has degraded quite a bit since on our 0.7 version we had our session CF tuned so that it would flush only once an hour, thus absorbing way more overwrites, thus having to do less compactions and on a failover scenario most request could be served straight from the memtable (since we are doing since column reads there). Currently we flush every 5 to 6 minutes under moderate load, so 10 times worse. This is with the s same heap setting etc. Would you guys consider allowing lower values than 1.0 for the liveRatio calculation? This would help us a lot. Perhaps make it a flag so it can be turned on and off? Ideally I would like the possibility back to tune on a CF by CF basis, this could be a special setting that needs to be enabled for power users. The default being what's there now. Also, in the current version the live ration can never adjust downwards, I see you guys have already made a fix for this in 1.1 but I have not seen it on the 1.0 branch. Let me know what you think Kind regards, Joost -- Joost van de Wijgerd joost.van.de.wijgerd@Skype http://www.linkedin.com/in/jwijgerd -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Rules for Major Compaction
On Tue, Jun 19, 2012 at 2:30 PM, Edward Capriolo edlinuxg...@gmail.com wrote: You final two sentences are good ground rules. In our case we have some column families that have high churn, for example a gc_grace period of 4 days but the data is re-written completely every day. Write activity over time will eventually cause tombstone removal but we can expedite the process by forcing a major at night. Because the tables are not really growing the **warning** below does not apply. Note that Cassandra 1.2 will automatically compact sstables that have more than a configurable amount of expired data (default 20%). So you won't have to force a major for this use case anymore. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Row caching in Cassandra 1.1 by column family
rows_cached is actually obsolete in 1.1. New hotness explained here: http://www.datastax.com/dev/blog/caching-in-cassandra-1-1 On Mon, Jun 18, 2012 at 7:43 PM, Chris Burroughs chris.burrou...@gmail.com wrote: Check out the rows_cached CF attribute. On 06/18/2012 06:01 PM, Oleg Dulin wrote: Dear distinguished colleagues: I don't want all of my CFs cached, but one in particular I do. How can I configure that ? Thanks, Oleg -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: cassandra secondary index with
That this will get you *worse* performance than just doing a seq scan would. Details as to why this is, are here: http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes On Tue, Jun 19, 2012 at 2:48 PM, Yuhan Zhang yzh...@onescreen.com wrote: To anwser my own question: There should be at least on equal expression in the indexed query to combine with a gte. so, I just added an trivial column that stays constant for equal comparison. and it works. not sure why this requirement exists. Thank you. Yuhan On Tue, Jun 19, 2012 at 12:23 PM, Yuhan Zhang yzh...@onescreen.com wrote: Hi all, I'm trying to search by the secondary index of cassandra with greater than or equal. but reached an exception stating: me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:No indexed columns present in index clause with operator EQ) However, the same column family with the same column, work when the search expression is an equal. I'm using the Hector java client. The secondary index type has been set to: {column_name: sport, validation_class: DoubleType, index_type:KEYS } here's the code reaching the exception: public QueryResultOrderedRowsString, String, Double getIndexedSlicesGTE(String columnFamily, String columnName, double value, String... columns) { Keyspace keyspace = getKeyspace(); StringSerializer se = CassandraStorage.getStringExtractor(); IndexedSlicesQueryString, String, Double indexedSlicesQuery = createIndexedSlicesQuery(keyspace, se, se, DoubleSerializer.get()); indexedSlicesQuery.setColumnFamily(columnFamily); indexedSlicesQuery.setStartKey(); if(columns != null) indexedSlicesQuery.setColumnNames(columns); else { indexedSlicesQuery.setRange(, , true, MAX_RECORD_NUMBER); } indexedSlicesQuery.setRowCount(CassandraStorage.MAX_RECORD_NUMBER); indexedSlicesQuery.addGteExpression(columnName, value); // this doesn't work :( //indexedSlicesQuery.addEqualsExpression(columnName, value); // this works! QueryResultOrderedRowsString, String, Double result = indexedSlicesQuery.execute(); return result; } Is there any column_meta setting that is required in order to make GTE comparison works on secondary index? Thank you. Yuhan Zhang -- Yuhan Zhang Application Developer OneScreen Inc. yzh...@onescreen.com www.onescreen.com The information contained in this e-mail is for the exclusive use of the intended recipient(s) and may be confidential, proprietary, and/or legally privileged. Inadvertent disclosure of this message does not constitute a waiver of any privilege. If you receive this message in error, please do not directly or indirectly print, copy, retransmit, disseminate, or otherwise use the information. In addition, please delete this e-mail and all copies and notify the sender. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: FYI: Java 7u4 on Linux requires higher stack size
Thanks, we're investigating in https://issues.apache.org/jira/browse/CASSANDRA-4275. On Fri, May 25, 2012 at 10:31 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: Hell all, ** ** We’ve started to test Oracle Java 7u4 (currently we’re on 7u3) on Linux to try G1 GC. ** ** Cassandra can’t start on 7u4 with exception: ** ** The stack size specified is too small, Specify at least 160k Cannot create Java VM ** ** Changing in cassandra-env.sh -Xss128k to -Xss160k allowed to start Cassandra, but when Thrift client disconnects, Cassandra log fills with exceptions: ** ** ERROR 17:08:56,300 Fatal exception in thread Thread[Thrift:13,5,main] java.lang.StackOverflowError at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ** ** Increasing stack size from 160k to 192k eliminated such excepitons. ** ** ** ** Just wanted you to know if someone tries to migrate to Java 7u4. ** ** ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsiderhttp://twitter.com/#%21/adforminsider What is Adform: watch this short video http://vimeo.com/adform/display [image: Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com signature-logo29.png
Re: supercolumns with TTL columns not being compacted correctly
Additionally, it will always take at least two compaction passes to purge an expired column: one to turn it into a tombstone, and a second (after gcgs) to remove it. On Tue, May 22, 2012 at 9:21 AM, Yuki Morishita mor.y...@gmail.com wrote: Data will not be deleted when those keys appear in other stables outside of compaction. This is to prevent obsolete data from appearing again. yuki On Tuesday, May 22, 2012 at 7:37 AM, Pieter Callewaert wrote: Hi Samal, Thanks for your time looking into this. I force the compaction by using forceUserDefinedCompaction on only that particular sstable. This gurantees me the new sstable being written only contains the data from the old sstable. The data in the sstable is more than 31 days old and gc_grace is 0, but still the data from the sstable is being written to the new one, while I am 100% sure all the data is invalid. Kind regards, Pieter Callewaert From: samal [mailto:samalgo...@gmail.com] Sent: dinsdag 22 mei 2012 14:33 To: user@cassandra.apache.org Subject: Re: supercolumns with TTL columns not being compacted correctly Data will remain till next compaction but won't be available. Compaction will delete old sstable create new one. On 22-May-2012 5:47 PM, Pieter Callewaert pieter.callewa...@be-mobile.be wrote: Hi, I’ve had my suspicions some months, but I think I am sure about it. Data is being written by the SSTableSimpleUnsortedWriter and loaded by the sstableloader. The data should be alive for 31 days, so I use the following logic: int ttl = 2678400; long timestamp = System.currentTimeMillis() * 1000; long expirationTimestampMS = (long) ((timestamp / 1000) + ((long) ttl * 1000)); And using this to write it: sstableWriter.newRow(bytes(entry.id)); sstableWriter.newSuperColumn(bytes(superColumn)); sstableWriter.addExpiringColumn(nameTT, bytes(entry.aggregatedTTMs), timestamp, ttl, expirationTimestampMS); sstableWriter.addExpiringColumn(nameCov, bytes(entry.observationCoverage), timestamp, ttl, expirationTimestampMS); sstableWriter.addExpiringColumn(nameSpd, bytes(entry.speed), timestamp, ttl, expirationTimestampMS); This works perfectly, data can be queried until 31 days are passed, then no results are given, as expected. But the data is still on disk until the sstables are being recompacted: One of our nodes (we got 6 total) has the following sstables: [cassandra@bemobile-cass3 ~]$ ls -hal /data/MapData007/HOS-* | grep G -rw-rw-r--. 1 cassandra cassandra 103G May 3 03:19 /data/MapData007/HOS-hc-125620-Data.db -rw-rw-r--. 1 cassandra cassandra 103G May 12 21:17 /data/MapData007/HOS-hc-163141-Data.db -rw-rw-r--. 1 cassandra cassandra 25G May 15 06:17 /data/MapData007/HOS-hc-172106-Data.db -rw-rw-r--. 1 cassandra cassandra 25G May 17 19:50 /data/MapData007/HOS-hc-181902-Data.db -rw-rw-r--. 1 cassandra cassandra 21G May 21 07:37 /data/MapData007/HOS-hc-191448-Data.db -rw-rw-r--. 1 cassandra cassandra 6.5G May 21 17:41 /data/MapData007/HOS-hc-193842-Data.db -rw-rw-r--. 1 cassandra cassandra 5.8G May 22 11:03 /data/MapData007/HOS-hc-196210-Data.db -rw-rw-r--. 1 cassandra cassandra 1.4G May 22 13:20 /data/MapData007/HOS-hc-196779-Data.db -rw-rw-r--. 1 cassandra cassandra 401G Apr 16 08:33 /data/MapData007/HOS-hc-58572-Data.db -rw-rw-r--. 1 cassandra cassandra 169G Apr 16 17:59 /data/MapData007/HOS-hc-61630-Data.db -rw-rw-r--. 1 cassandra cassandra 173G Apr 17 03:46 /data/MapData007/HOS-hc-63857-Data.db -rw-rw-r--. 1 cassandra cassandra 105G Apr 23 06:41 /data/MapData007/HOS-hc-87900-Data.db As you can see, the following files should be invalid: /data/MapData007/HOS-hc-58572-Data.db /data/MapData007/HOS-hc-61630-Data.db /data/MapData007/HOS-hc-63857-Data.db Because they are all written more than an moth ago. gc_grace is 0 so this should also not be a problem. As a test, I use forceUserSpecifiedCompaction on the HOS-hc-61630-Data.db. Expected behavior should be an empty file is being written because all data in the sstable should be invalid: Compactionstats is giving: compaction type keyspace column family bytes compacted bytes total progress Compaction MapData007 HOS 11518215662 532355279724 2.16% And when I ls the directory I find this: -rw-rw-r--. 1 cassandra cassandra 3.9G May 22 14:12 /data/MapData007/HOS-tmp-hc-196898-Data.db The sstable is being 1-on-1 copied to a new one. What am I missing here? TTL works perfectly, but is it giving a problem because it is in a super column, and so never to be deleted from disk? Kind regards Pieter Callewaert | Web IT engineer Be-Mobile NV | TouringMobilis Technologiepark 12b - 9052 Ghent - Belgium Tel + 32 9 330 51 80 | Fax + 32 9 330 51 81 | Cell + 32 473 777 121 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra
Re: supercolumns with TTL columns not being compacted correctly
Correction: the first compaction after expiration + gcgs can remove it, even if it hasn't been turned into a tombstone previously. On Tue, May 22, 2012 at 9:37 AM, Jonathan Ellis jbel...@gmail.com wrote: Additionally, it will always take at least two compaction passes to purge an expired column: one to turn it into a tombstone, and a second (after gcgs) to remove it. On Tue, May 22, 2012 at 9:21 AM, Yuki Morishita mor.y...@gmail.com wrote: Data will not be deleted when those keys appear in other stables outside of compaction. This is to prevent obsolete data from appearing again. yuki On Tuesday, May 22, 2012 at 7:37 AM, Pieter Callewaert wrote: Hi Samal, Thanks for your time looking into this. I force the compaction by using forceUserDefinedCompaction on only that particular sstable. This gurantees me the new sstable being written only contains the data from the old sstable. The data in the sstable is more than 31 days old and gc_grace is 0, but still the data from the sstable is being written to the new one, while I am 100% sure all the data is invalid. Kind regards, Pieter Callewaert From: samal [mailto:samalgo...@gmail.com] Sent: dinsdag 22 mei 2012 14:33 To: user@cassandra.apache.org Subject: Re: supercolumns with TTL columns not being compacted correctly Data will remain till next compaction but won't be available. Compaction will delete old sstable create new one. On 22-May-2012 5:47 PM, Pieter Callewaert pieter.callewa...@be-mobile.be wrote: Hi, I’ve had my suspicions some months, but I think I am sure about it. Data is being written by the SSTableSimpleUnsortedWriter and loaded by the sstableloader. The data should be alive for 31 days, so I use the following logic: int ttl = 2678400; long timestamp = System.currentTimeMillis() * 1000; long expirationTimestampMS = (long) ((timestamp / 1000) + ((long) ttl * 1000)); And using this to write it: sstableWriter.newRow(bytes(entry.id)); sstableWriter.newSuperColumn(bytes(superColumn)); sstableWriter.addExpiringColumn(nameTT, bytes(entry.aggregatedTTMs), timestamp, ttl, expirationTimestampMS); sstableWriter.addExpiringColumn(nameCov, bytes(entry.observationCoverage), timestamp, ttl, expirationTimestampMS); sstableWriter.addExpiringColumn(nameSpd, bytes(entry.speed), timestamp, ttl, expirationTimestampMS); This works perfectly, data can be queried until 31 days are passed, then no results are given, as expected. But the data is still on disk until the sstables are being recompacted: One of our nodes (we got 6 total) has the following sstables: [cassandra@bemobile-cass3 ~]$ ls -hal /data/MapData007/HOS-* | grep G -rw-rw-r--. 1 cassandra cassandra 103G May 3 03:19 /data/MapData007/HOS-hc-125620-Data.db -rw-rw-r--. 1 cassandra cassandra 103G May 12 21:17 /data/MapData007/HOS-hc-163141-Data.db -rw-rw-r--. 1 cassandra cassandra 25G May 15 06:17 /data/MapData007/HOS-hc-172106-Data.db -rw-rw-r--. 1 cassandra cassandra 25G May 17 19:50 /data/MapData007/HOS-hc-181902-Data.db -rw-rw-r--. 1 cassandra cassandra 21G May 21 07:37 /data/MapData007/HOS-hc-191448-Data.db -rw-rw-r--. 1 cassandra cassandra 6.5G May 21 17:41 /data/MapData007/HOS-hc-193842-Data.db -rw-rw-r--. 1 cassandra cassandra 5.8G May 22 11:03 /data/MapData007/HOS-hc-196210-Data.db -rw-rw-r--. 1 cassandra cassandra 1.4G May 22 13:20 /data/MapData007/HOS-hc-196779-Data.db -rw-rw-r--. 1 cassandra cassandra 401G Apr 16 08:33 /data/MapData007/HOS-hc-58572-Data.db -rw-rw-r--. 1 cassandra cassandra 169G Apr 16 17:59 /data/MapData007/HOS-hc-61630-Data.db -rw-rw-r--. 1 cassandra cassandra 173G Apr 17 03:46 /data/MapData007/HOS-hc-63857-Data.db -rw-rw-r--. 1 cassandra cassandra 105G Apr 23 06:41 /data/MapData007/HOS-hc-87900-Data.db As you can see, the following files should be invalid: /data/MapData007/HOS-hc-58572-Data.db /data/MapData007/HOS-hc-61630-Data.db /data/MapData007/HOS-hc-63857-Data.db Because they are all written more than an moth ago. gc_grace is 0 so this should also not be a problem. As a test, I use forceUserSpecifiedCompaction on the HOS-hc-61630-Data.db. Expected behavior should be an empty file is being written because all data in the sstable should be invalid: Compactionstats is giving: compaction type keyspace column family bytes compacted bytes total progress Compaction MapData007 HOS 11518215662 532355279724 2.16% And when I ls the directory I find this: -rw-rw-r--. 1 cassandra cassandra 3.9G May 22 14:12 /data/MapData007/HOS-tmp-hc-196898-Data.db The sstable is being 1-on-1 copied to a new one. What am I missing here? TTL works perfectly, but is it giving a problem because it is in a super column, and so never to be deleted from disk? Kind regards Pieter Callewaert | Web IT engineer Be-Mobile NV | TouringMobilis Technologiepark 12b - 9052
Re: need some clarification on recommended memory size
So, you're doing about 20 ops/s where each op consists of read 2 metadata columns, then read ~250 columns of ~2K each. Is that right? Is your test client multithreaded? Is it on a separate machine from the Cassandra server? What is your bottleneck? http://spyced.blogspot.com/2010/01/linux-performance-basics.html On Thu, May 17, 2012 at 1:08 PM, Yiming Sun yiming@gmail.com wrote: Hi Aaron, Thank you for guiding us by breaking down the issue. Please see my answers embedded Is this a single client ? Yes How many columns is it asking for ? the client knows a list of all row keys, and it randomly picks 100, and loops 100 times. It first reads a metadata column to figure out how many columns to read, and it then reads these columns What sort of query are you sending, slice or named columns? currently all queries are slice queries. so the first slice query reads the metadata column (actually 2 metadata columns, one is for Number of columns to read, the other for other information which is not needed for the purpose of performance test, but I kept it in there to make it similar to the real situation). It then generates the column name array and sends the second slice query. The timing for the queries is completely isolated, and excludes the time spent generating column name array etc. From the client side how long is a single read taking ? I am not 100% sure on what you are asking... are you saying how long it takes for SliceQuery.execute()? The average we are getting are between 50-70 ms, and nodetool report similar latency, differ by 5-10ms at top. What is the write workload like? it sounds like it's write once read many. Indeed it is like a WORM environment. For the performance, we don't have any writes. memory speed network speed yes. right now, our data is only a sample about 250K rows, so the default 200,000 key cache hits above 90%. But we soon will be hosting the real deal with about 3M rows, so I am not sure our memory size will be able to keep up with it. In any case, Aaron, please let us know if you have any suggestions/comments/insights. Thanks! -- Y. On Thu, May 17, 2012 at 1:04 AM, aaron morton aa...@thelastpickle.com wrote: The read rate that I have been seeing is about 3MB/sec, and that is reading the raw bytes... using string serializer the rate is even lower, about 2.2MB/sec. Can we break this down a bit: Is this a single client ? How many columns is it asking for ? What sort of query are you sending, slice or named columns? From the client side how long is a single read taking ? What is the write workload like? it sounds like it's write once read many. Use nodetool cfstats to see what the read latency is on a single node. (see http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/) Is there much difference between this and the latency from the client perspective ? Using JNA may help, but a blog article seems to say it only increase 13%, which is not very significant when the base performance is in single-digit MBs. There are other reasons to have JNA installed: more efficient snapshots and advising the OS when file operations should not be cached. Our environment is virtualized, and the disks are actually SAN through fiber channels, so I don't know if that has impact on performance as well. memory speed network speed - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Exception when truncate
Sounds like you have a permissions problem. Cassandra creates a subdirectory for each snapshot. On Thu, May 17, 2012 at 4:57 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hello I have follow situation on our test server: from cassandra-cli i try to use truncate purchase_history; 3 times i got: [default@township_6waves] truncate purchase_history; null UnavailableException() at org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20212) at org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:1077) at org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1052) at org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445) at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272) at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:220) at org.apache.cassandra.cli.CliMain.main(CliMain.java:348) So this looks that truncate goes very slow and too long, than rpc_timeout_in_ms: 1 (this can happens because we have very slow disck on test machine) But in in cassandra system log i see follow exception: ERROR [MutationStage:7022] 2012-05-17 12:19:14,356 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:7022,5,main] java.io.IOError: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462) at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657) at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: unable to mkdirs /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:140) at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:131) at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1409) ... 7 more Also i see that in snapshort dir already exists 1337242754356-purchase_history directory, so i think that snapshort names that generate cassandra not uniquely. PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Migration from cassandra 0.8.6 to 1.1.0
1.1 will migrate your data to the new directory structure, but it needs the 0.8 schema to do that. Then you can drop the unwanted keyspace post-upgrade. On Fri, May 18, 2012 at 11:58 AM, Harshvardhan Ojha harshvardhan.o...@makemytrip.com wrote: Hi All, ** ** I am trying to migrate from Cassandra version 0.8.6 to 1.1.0. I had two keyspace and I wanted to keep only one. So I deleted system and ran schema again for another keyspace. After running schema for keyspace I noticed that new folders are created for every column family, inside keyspace folder. So data is not available on Cassandra 1.1.0. Is it a new feature to create folder for each column family in keyspace? * *** How can I get all data from old keyspace in new version? Any suggestion would be highly appreciable. ** ** *Harshvardhan Ojha* *|* Software Developer - Technology Development | MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, Gurgaon, Haryana - 122 016, India [image: Description: http://www.mailmktg.makemytrip.com/signature/images/bulb.gif]*What's new?: Inspire *- Discover an inspiring new way to plan and book travel online http://inspire.makemytrip.com/inspire/. [image: Description: http://www.mailmktg.makemytrip.com/signature/images/MMT-signature-footer-V4.gif]http://www.makemytrip.com/ [image: Description: http://www.mailmktg.makemytrip.com/signature/images/map-icon.gif]http://www.makemytrip.com/support/gurgaon-travel-agent-office.php *Office Map* [image: Description: http://www.mailmktg.makemytrip.com/signature/images/facebook-icon.gif]http://www.facebook.com/pages/MakeMyTrip-Deals/120740541030?ref=searchsid=10077980239.1422657277..1 *Facebook* [image: Description: http://www.mailmktg.makemytrip.com/signature/images/twitter-icon.gif]http://twitter.com/makemytripdeals *Twitter* ** ** -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com image003.gifimage005.gifimage004.gifimage001.gifimage002.gif
Re: Migrating a column family from one cluster to another
Better: use bin/sstableloader, which will copy exactly the right ranges of data to the new cluster. On Fri, May 18, 2012 at 3:39 PM, Rob Coli rc...@palominodb.com wrote: On Thu, May 17, 2012 at 9:37 AM, Bryan Fernandez bfernande...@gmail.com wrote: What would be the recommended approach to migrating a few column families from a six node cluster to a three node cluster? The easiest way (if you are not using counters) is : 1) make sure all filenames of sstables are unique [1] 2) copy all sstablefiles from the 6 nodes to all 3 nodes 3) run a cleanup compaction on the 3 nodes =Rob [1] https://issues.apache.org/jira/browse/CASSANDRA-1983 -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: while compaction occur EOFException
Looks like sstable corruption to me. Bad memory can often cause this. You should upgrade to the latest 0.7 release and run nodetool scrub. I don't think the 0.7.3 scrub was very robust. On Thu, May 17, 2012 at 1:36 AM, Preston Cheung zhangyf2...@gmail.com wrote: While doing compaction, cassandra occured an EOFException, and it seems that compaction failed. I wonder whether my sstables are corrupt or it is a bug? Thanks all help! Our cassandra is 0.7.3. CentOS 5.4 jdk1.7.0 This is the log: INFO [CompactionExecutor:1] 2012-05-17 10:42:18,095 CompactionManager.java (line 452) Compacting [SSTableReader(path='/data00/data/picasso/value-f-63129-Dat a.db'),SSTableReader(path='/data01/data/picasso/value-f-63893-Data.db'),SSTableReader(path='/data01/data/picasso/value-f-63989-Data.db'),SSTableReader(path=' /data00/data/picasso/value-f-63691-Data.db'),SSTableReader(path='/data00/data/picasso/value-f-61779-Data.db'),SSTableReader(path='/data00/data/picasso/value- f-61916-Data.db'),SSTableReader(path='/data00/data/picasso/value-f-61875-Data.db'),SSTableReader(path='/data00/data/picasso/value-f-63296-Data.db'),SSTableRe ader(path='/data00/data/picasso/value-f-62139-Data.db'),SSTableReader(path='/data00/data/picasso/value-f-63821-Data.db')] ERROR [CompactionExecutor:1] 2012-05-17 10:42:24,306 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:117) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:67) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:179) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:144) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:136) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:39) at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:505) at org.apache.cassandra.db.CompactionManager$4.call(CompactionManager.java:256) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.EOFException at org.apache.cassandra.io.sstable.IndexHelper.skipIndex(IndexHelper.java:65) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:109) ... 20 more thx -- by Preston Cheung -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Snapshot failing on JSON files in 1.1.0
/blotter/twitter_users/snapshots/1337115022389/twitter_users.json -rw-r--r-- 1 root root 38778 May 15 20:50 /var/lib/cassandra/data/blotter/twitter_users/snapshots/1337115022389/twitter_users.json We are using Leveled Compaction on the twitter_users CF with I assume is creating the JSON files. [root@cassandra-n6 blotter]# ls -al /var/lib/cassandra/data/blotter/twitter_users/*.json -rw-r--r-- 1 root root 38779 May 15 20:51 /var/lib/cassandra/data/blotter/twitter_users/twitter_users.json -rw-r--r-- 1 root root 38779 May 15 20:51 /var/lib/cassandra/data/blotter/twitter_users/twitter_users-old.json -rw-r--r-- 1 root root 1040 May 15 20:51 /var/lib/cassandra/data/blotter/twitter_users/twitter_users.twitter_user_attributes_screenname_idx.json -rw-r--r-- 1 root root 1046 May 15 20:50 /var/lib/cassandra/data/blotter/twitter_users/twitter_users.twitter_user_attributes_screenname_idx-old.json The other column families which are not using Leveled Compaction seem to have their snapshots created successfully. Any ideas other than turning off Leveled Compaction? Thanks, Brian -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: CQL 3.0 Features
In the meantime, Sylvain just posted this: http://www.datastax.com/dev/blog/cql3-evolutions On Wed, May 16, 2012 at 11:45 AM, paul cannon p...@datastax.com wrote: Sylvain has a draft on https://issues.apache.org/jira/browse/CASSANDRA-3779 , and that should be an official cassandra project doc real soon now. If you're asking about Datastax's reference docs for CQL 3, they will probably be released once Datastax Enterprise or Datastax Community is released with Cassandra 1.1. p On Wed, May 16, 2012 at 10:57 AM, Roland Mechler rmech...@sencha.com wrote: http://www.datastax.com/dev/blog/whats-new-in-cql-3-0 It's my understanding that that the actual reference documentation for 3.0 should be ready soon. Anyone know when? -Roland On Wed, May 16, 2012 at 12:04 AM, Tamil selvan R.S tamil.3...@gmail.com wrote: Hi, Is there a tutorial or reference on CQL 3.0 Features. In cassandra download site the reference is still pointing to 2.0 Specifically Composite Types Regards, Tamil.s -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: stream data using bulkoutputformat
We're working on this over at https://issues.apache.org/jira/browse/CASSANDRA-4208 On Fri, May 4, 2012 at 4:56 PM, Shawna Qian shaw...@yahoo-inc.com wrote: Hi Group: I am following this great example to use bulkouputformat to streaming the data from hadoop to cassandra. http://shareitexploreit.blogspot.com/2012/03/bulkloadto-cassandra-with-hado op.html. It works perfectly when my keyspace has one cf. But in my case, I have 2 coulumn families defined in the keyspace that I want to stream the data to both of them at the same mapper. Seems like the configHelper can only set one output column family. Is there a way that I can set multiple column families in one keyspace and output data to all the cfs? Thx Shawna -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: AssertionError: originally calculated column size ...
On Mon, Apr 30, 2012 at 2:11 PM, Patrik Modesto patrik.mode...@gmail.com wrote: I think the problem is somehow connected to an IntegerType secondary index. Could be, but my money is on the supercolumns in the HH data model. Can you create a jira ticket? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: JNA + Cassandra security
On Mon, Apr 30, 2012 at 7:49 PM, Cord MacLeod cordmacl...@gmail.com wrote: Hello group, I'm a new Cassandra and Java user so I'm still trying to get my head around a few things. If you've disabled swap on a machine what is the reason to use JNA? Faster snapshots, giving hints to the page cache with fadvise. A second question is doesn't JNA break the Java inherent security mechanisms by allowing access to direct system calls outside of the JVM? Are there any concerns around this? We're not trying to sandbox anything here; there's lots of places where we explicitly allow arbitrary Java code to be injected into Cassandra. You don't need native code to do dangerous things with that! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: incremental_backups
Incremental snapshots contain only new data, so they are *much* smaller. On Mon, Apr 30, 2012 at 12:39 AM, Tamar Fraenkel ta...@tok-media.comwrote: Hi! I wonder what are the advantages of doing incremental snapshot over non incremental? Are the snapshots smaller is size? Are there any other implications? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com tokLogo.png
Re: Cql 3 wide rows filter expressions in where clause
That should work. I don't see anything obviously wrong with your query, other than the trivial (ascii values need to be quoted). Assuming that's not the problem, please file a ticket if you have a failing test case. On Fri, Apr 20, 2012 at 11:59 PM, Nagaraj J nagaraj.pe...@gmail.com wrote: Hi cql 3 for wide rows is very promising. I was wondering if there is support for filtering wide rows by additional filter expressions in where clause (columns other than those which are part of the composite). Ex. suppose i have sparse cf create columnfamily scf( k ascii, o ascii, x ascii, y ascii, z ascii, PRIMARY KEY(k, o)); is it possible to have a query select * from scf where k=1 and x=2 and z=2 order by o ASC; I tried this with 1.1-rc and it doesnt work as expected. Also looked at cql_tests.py in https://issues.apache.org/jira/browse/CASSANDRA-2474 there is no mention of this. Am i missing something here ? Thanks in advance Nagaraj -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cql-3-wide-rows-filter-expressions-in-where-clause-tp7486344p7486344.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: size tiered compaction - improvement
It's not that simple, unless you have an append-only workload. (See discussion on https://issues.apache.org/jira/browse/CASSANDRA-3974.) On Wed, Apr 18, 2012 at 4:57 AM, Radim Kolar h...@filez.com wrote: Any compaction pass over A will first convert the TTL data into tombstones. Then, any subsequent pass that includes A *and all other sstables containing rows with the same key* will drop the tombstones. thats why i proposed to attach TTL to entire CF. Tombstones would not be needed -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Resident size growth
On Wed, Apr 18, 2012 at 12:44 PM, Rob Coli rc...@palominodb.com wrote: On Tue, Apr 10, 2012 at 8:40 AM, ruslan usifov ruslan.usi...@gmail.com wrote: mmap doesn't depend on jna FWIW, this confusion is as a result of the use of *mlockall*, which is used to prevent mmapped files from being swapped, which does depend on JNA. mlockall does depend on JNA, but we only lock the JVM itself in memory. The OS is free to page data files in and out as needed. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released
64bit is recommended where that's available. If you actually did have a 32bit machine or VM, then you should dramatically reduce the commitlog space cap to the minimum of 128MB so it doesn't need to mmap so much. On Tue, Apr 17, 2012 at 1:45 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Sorry, I found the issue. The server I was using had 32bit java installed. -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Monday, April 16, 2012 11:39 PM To: user@cassandra.apache.org Subject: Re: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released On Mon, Apr 16, 2012 at 10:45 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: I keep running into this with my testing (on a windows box), Is this just a OOM for RAM? How much RAM do you have? Do you use completely standard settings? Do you also OOM if you try the same test with Cassandra 1.0.9? -- Sylvain ERROR [COMMIT-LOG-ALLOCATOR] 2012-04-16 13:36:18,790 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main] java.io.IOError: java.io.IOException: Map failed at org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSeg ment.java:127) at org.apache.cassandra.db.commitlog.CommitLogSegment.freshSegment(Commit LogSegment.java:80) at org.apache.cassandra.db.commitlog.CommitLogAllocator.createFreshSegmen t(CommitLogAllocator.java:244) at org.apache.cassandra.db.commitlog.CommitLogAllocator.access$500(Commit LogAllocator.java:49) at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(Com mitLogAllocator.java:104) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30 ) at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(Unknown Source) at org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSeg ment.java:119) ... 6 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) ... 8 more INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961 CassandraDaemon.java (line 218) Stop listening to thrift clients INFO [StorageServiceShutdownHook] 2012-04-16 13:36:18,961 MessagingService.java (line 539) Waiting for messaging service to quiesce INFO [ACCEPT-/10.47.1.15] 2012-04-16 13:36:18,977 MessagingService.java (line 695) MessagingService shutting down server thread. -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Friday, April 13, 2012 9:41 AM To: user@cassandra.apache.org Subject: [RELEASE CANDIDATE] Apache Cassandra 1.1.0-rc1 released The Cassandra team is pleased to announce the release of the first release candidate for the future Apache Cassandra 1.1. Please first note that this is a release candidate, *not* the final release yet. All help in testing this release candidate will be greatly appreciated. Please report any problem you may encounter[3,4] and have a look at the change log[1] and the release notes[2] to see where Cassandra 1.1 differs from the previous series. Apache Cassandra 1.1.0-rc1[5] is available as usual from the cassandra website (http://cassandra.apache.org/download/) and a debian package is available using the 11x branch (see http://wiki.apache.org/cassandra/DebianPackaging). Thank you for your help in testing and have fun with it. [1]: http://goo.gl/XwH7J (CHANGES.txt) [2]: http://goo.gl/JocLX (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA [4]: user@cassandra.apache.org [5]: http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=re fs/tags/cassandra-1.1.0-rc1 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: size tiered compaction - improvement
On Sat, Apr 14, 2012 at 3:27 AM, Radim Kolar h...@filez.com wrote: forceUserDefinedCompaction would be more usefull if you could do compaction on 2 tables. You absolutely can. That's what the user defined part is: you give it the exact list of sstables you want compacted. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: size tiered compaction - improvement
On Sat, Apr 14, 2012 at 4:08 AM, Igor i...@4friends.od.ua wrote: Assume I insert all my data with TTL=2weeks and let we have sstable A which was created week ago at the time T, so I know that right now it contain: 1) some data that were inserted not later than T and may-be not expired yet 2) some amount of data that were already close to expiration due TTL at the time T, but still had no chances to be wiped out because up to the current moment size-tiered compaction did not involve A into compactions. Large amount of data from 2) became expired in a week after time T and probably passed gc_grace period, so it shoould be wiped at any compaction on table A. Any compaction pass over A will first convert the TTL data into tombstones. Then, any subsequent pass that includes A *and all other sstables containing rows with the same key* will drop the tombstones. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Off-heap row cache and mmapped sstables
Absolutely. Best practice is still to disable swap entirely on server machines; mlockall is just our best attempt to at least keep your JVM from swapping if you've forgotten this. On Thu, Apr 12, 2012 at 11:15 AM, Omid Aladini omidalad...@gmail.com wrote: Hi, Cassandra issues an mlockall [1] before mmap-ing sstables to prevent the kernel from paging out heap space in favor of memory-mapped sstables. I was wondering, what happens to the off-heap row cache (saved or unsaved)? Is it possible that the kernel pages out off-heap row cache in favor of resident mmap-ed sstable pages? Thanks, Omid [1] http://pubs.opengroup.org/onlinepubs/007908799/xsh/mlockall.html -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Long start-up times
On Sun, Apr 15, 2012 at 2:47 PM, sj.climber sj.clim...@gmail.com wrote: Also, I see in 1.0.9 there's a fix for a potentially related issue (see https://issues.apache.org/jira/browse/CASSANDRA-4023). Any thoughts on this? My thought is, upgrading is a no-brainer if that's a pain point for you. :) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: swap grows
Swappiness is actually a fairly weak hint to linux: http://www.linuxvox.com/2009/10/what-is-the-linux-kernel-parameter-vm-swappiness On Sat, Apr 14, 2012 at 1:39 PM, aaron morton aa...@thelastpickle.com wrote: From https://help.ubuntu.com/community/SwapFaq swappiness=0 tells the kernel to avoid swapping processes out of physical memory for as long as possible If you have swap enabled at some point the OS may swap out pages, even if swappiness is 0 and you have free memory. Disable swap entirely if you want to avoid this. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/04/2012, at 1:37 AM, R. Verlangen wrote: Maybe it has got something to do with swapiness, it's something you can configure, more info here: https://www.linux.com/news/software/applications/8208-all-about-linux-swap-space 2012/4/14 ruslan usifov ruslan.usi...@gmail.com I know:-) but this is not answer:-(. I found that on other nodes there still about 3GB (on node with JAVA_HEAP=6GB free memory also 3GB) of free memory but there JAVA_HEAP=5G, so this looks like some sysctl (/proc/sys/vm???) ratio (about 10%(3 / 24 * 100)), i don't known which, anybody can explain this situation 2012/4/14 R. Verlangen ro...@us2.nl Its recommended to disable swap entirely when you run Cassandra on a server. 2012/4/14 ruslan usifov ruslan.usi...@gmail.com I forgot to say that system have 24GB of phis memory 2012/4/14 ruslan usifov ruslan.usi...@gmail.com Hello We have 6 node cluster (cassandra 0.8.10). On one node i increase java heap size to 6GB, and now at this node begin grows swap, but system have about 3GB of free memory: root@6wd003:~# free total used free shared buffers cached Mem: 24733664 21702812 3030852 0 6792 13794724 -/+ buffers/cache: 7901296 16832368 Swap: 1998840 2352 1996488 And swap space slowly grows, but i misunderstand why? PS: We have JNA mlock, and set vm.swappiness = 0 PS: OS ubuntu 10.0.4(2.6.32-40-generic) -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: java.nio.BufferOverflowException from cassandra server
If I were to take a wild guess, it would be that you're using a single Thrift connection in multiple threads, which isn't supported. On Mon, Apr 16, 2012 at 6:43 PM, Aniket Chakrabarti chakr...@cse.ohio-state.edu wrote: Hi, I have set up a 4 node cassandra cluster. I am using the Thrift C++ API to write a simple C++ application with creates a 50% READ 50% WRITE requests. Every time near about a thousand request mark, I am getting the following exception and my connection is broken: === ERROR 17:30:27,647 Error occurred during processing of message. java.nio.BufferOverflowException at java.nio.charset.CoderResult.throwException(Unknown Source) at java.lang.StringCoding$StringEncoder.encode(Unknown Source) at java.lang.StringCoding.encode(Unknown Source) at java.lang.String.getBytes(Unknown Source) at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:185) at org.apache.thrift.protocol.TBinaryProtocol.writeMessageBegin(TBinaryProtocol.java:92) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3302) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) == Some info about the config I am using: - It is a 4 node cluster with only 1 seed. -The consistency level is also set to ONE. -The max heap size and new heap size is set to 4G and 800M(I tried without setting them as well) -Java is run in the interpreted mode(-Xint) -I'm using user mode linux Any pointers to what I might be doing wrong will be very helpful. Thanks in advance, Aniket -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: size tiered compaction - improvement
On Tue, Apr 17, 2012 at 11:26 PM, Igor i...@4friends.od.ua wrote: You absolutely can. That's what the user defined part is: you give it the exact list of sstables you want compacted. does it mean that I can use list (not just one) of sstables as second parameter for userDefinedCompaction? If you want them all compacted together into one big sstable, yes. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Why so many SSTables?
LCS explicitly tries to keep sstables under 5MB to minimize extra work done by compacting data that didn't really overlap across different levels. On Tue, Apr 10, 2012 at 9:24 AM, Romain HARDOUIN romain.hardo...@urssaf.fr wrote: Hi, We are surprised by the number of files generated by Cassandra. Our cluster consists of 9 nodes and each node handles about 35 GB. We're using Cassandra 1.0.6 with LeveledCompactionStrategy. We have 30 CF. We've got roughly 45,000 files under the keyspace directory on each node: ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l 44372 The biggest CF is spread over 38,000 files: ls -l Documents* | wc -l 37870 ls -l Documents*-Data.db | wc -l 7586 Many SSTable are about 4 MB: 19 MB - 1 SSTable 12 MB - 2 SSTables 11 MB - 2 SSTables 9.2 MB - 1 SSTable 7.0 MB to 7.9 MB - 6 SSTables 6.0 MB to 6.4 MB - 6 SSTables 5.0 MB to 5.4 MB - 4 SSTables 4.0 MB to 4.7 MB - 7139 SSTables 3.0 MB to 3.9 MB - 258 SSTables 2.0 MB to 2.9 MB - 35 SSTables 1.0 MB to 1.9 MB - 13 SSTables 87 KB to 994 KB - 87 SSTables 0 KB - 32 SSTables FYI here is CF information: ColumnFamily: Documents Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 20.0/14400 GC grace seconds: 1728000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Column Metadata: Column Name: refUUID (7265664944) Validation Class: org.apache.cassandra.db.marshal.BytesType Index Name: refUUID_idx Index Type: KEYS Compaction Strategy: org.apache.cassandra.db.compaction.LeveledCompactionStrategy Compression Options: sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor Is it a bug? If not, how can we tune Cassandra to avoid this? Regards, Romain -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Bulk loading errors with 1.0.8
On Thu, Apr 5, 2012 at 10:58 AM, Benoit Perroud ben...@noisette.ch wrote: ERROR [Thread-23] 2012-04-05 09:58:12,252 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-23,5,main] java.lang.RuntimeException: Insufficient disk space to flush 7813594056494754913 bytes at org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:635) at org.apache.cassandra.streaming.StreamIn.getContextMapping(StreamIn.java:92) at org.apache.cassandra.streaming.IncomingStreamReader.init(IncomingStreamReader.java:68) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81) Here I'm not really sure I was able to generate 7 exa bytes of data ;) The bulk loader told the Cassandra node, I have 7EB of data for you. And the C* node threw this error. So you need to troubleshoot the bulk loader side. If you feel lucky, we've done some work on streaming in 1.1 to make it more robust, but I don't recognize this specific problem so I can't say for sure if 1.1 would help. ERROR [Thread-46] 2012-04-05 09:58:14,453 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-46,5,main] java.lang.NullPointerException at org.apache.cassandra.io.sstable.SSTable.getMinimalKey(SSTable.java:156) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:334) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302) at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:155) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:89) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81) This one sounds like a null key added to the SSTable at some point, but I'm rather confident I'm checking for key nullity. The stacktrace indicates an error with the very first key in the sstable, if that helps. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: leveled compaction - improve log message
CompactionExecutor doesn't have level information available to it; it just compacts the sstables it's told to. But if you enable debug logging on LeveledManifest you'd see what you want. (Compaction candidates for L{} are {}) 2012/4/5 Radim Kolar h...@filez.com: it would be really helpfull if leveled compaction prints level into syslog. Demo: INFO [CompactionExecutor:891] 2012-04-05 22:39:27,043 CompactionTask.java (line 113) Compacting ***LEVEL 1*** [SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19690-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19688-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19691-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19700-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19686-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19696-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19687-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19695-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19689-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19694-Data.db'), SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19693-Data.db')] INFO [CompactionExecutor:891] 2012-04-05 22:39:57,299 CompactionTask.java (line 221) *** LEVEL 1 *** Compacted to [/var/lib/cassandra/data/rapidshare/querycache-hc-19701-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19702-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19703-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19704-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19705-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19706-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19707-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19708-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19709-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19710-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19711-Data.db,]. 59,643,011 to 57,564,216 (~96% of original) bytes for 590,909 keys at 1.814434MB/s. Time: 30,256ms. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: size tiered compaction - improvement
Twitter tried a timestamp-based compaction strategy in https://issues.apache.org/jira/browse/CASSANDRA-2735. The conclusion was, this actually resulted in a lot more compactions than the SizeTieredCompactionStrategy. The increase in IO was not acceptable for our use and therefore stopped working on this patch. 2012/4/3 Radim Kolar h...@filez.com: there is problem with size tiered compaction design. It compacts together tables of similar size. sometimes it might happen that you will have some sstables sitting on disk forever (Feb 23) because no other similar sized tables were created and probably never be. because flushed sstable is about 11-16 mb. next level about 90 MB then 5x 90 MB gets compacted to 400 MB sstable and 5x400 MB ~ 2 GB problem is that 400 MB sstable is too small to be compacted against these 3x 720 MB ones. -rw-r--r-- 1 root wheel 165M Feb 23 17:03 resultcache-hc-13086-Data.db -rw-r--r-- 1 root wheel 772M Feb 23 17:04 resultcache-hc-13087-Data.db -rw-r--r-- 1 root wheel 156M Feb 23 17:06 resultcache-hc-13091-Data.db -rw-r--r-- 1 root wheel 716M Feb 23 17:18 resultcache-hc-13096-Data.db -rw-r--r-- 1 root wheel 734M Feb 23 17:29 resultcache-hc-13101-Data.db -rw-r--r-- 1 root wheel 5.0G Mar 14 09:38 resultcache-hc-13923-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 16 22:41 resultcache-hc-14084-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 21 15:11 resultcache-hc-14460-Data.db -rw-r--r-- 1 root wheel 1.9G Mar 27 05:22 resultcache-hc-14694-Data.db -rw-r--r-- 1 root wheel 2.0G Mar 31 04:57 resultcache-hc-14851-Data.db -rw-r--r-- 1 root wheel 112M Mar 31 06:30 resultcache-hc-14922-Data.db -rw-r--r-- 1 root wheel 577M Apr 1 19:25 resultcache-hc-14943-Data.db compaction strategy needs to compact sstables by timestamp too. older tables should have increased chance to get compacted. for example - table from today will be compacted with other table in range (0.5-1.5) of its size, and this range will get increased with sstable age. - 1 month old will have range for example (0.2 - 1.8). -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Largest 'sensible' value
We use 2MB chunks for our CFS implementation of HDFS: http://www.datastax.com/dev/blog/cassandra-file-system-design On Mon, Apr 2, 2012 at 4:23 AM, Franc Carter franc.car...@sirca.org.au wrote: Hi, We are in the early stages of thinking about a project that needs to store data that will be accessed by Hadoop. One of the concerns we have is around the Latency of HDFS as our use case is is not for reading all the data and hence we will need custom RecordReaders etc. I've seen a couple of comments that you shouldn't put large chunks in to a value - however 'large' is not well defined for the range of people using these solutions ;-) Doe anyone have a rough rule of thumb for how big a single value can be before we are outside sanity? thanks -- Franc Carter | Systems architect | Sirca Ltd franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: column’s timestamp
That would work, with the caveat that you'd have to delete it and re-insert if you want to preserve that relationship on update. On Mon, Apr 2, 2012 at 12:18 PM, Pierre Chalamet pie...@chalamet.net wrote: Hi, What about using a ts as column name and do a get sliced instead ? --Original Message-- From: Avi-h To: cassandra-u...@incubator.apache.org ReplyTo: user@cassandra.apache.org Subject: column’s timestamp Sent: Apr 2, 2012 18:24 Is it possible to fetch a column based on the row key and the column’s timestamp only (not using the column’s name)? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/column-s-timestamp-tp7429905p7429905.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. - Pierre -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: really bad select performance
Secondary indexes can generate a lot of random i/o. iostat -x can confirm if that's your problem. On Thu, Mar 29, 2012 at 5:52 PM, Chris Hart ch...@remilon.com wrote: Hi, I have the following cluster: 136112946768375385385349842972707284580 ip address MountainViewRAC1 Up Normal 1.86 GB 20.00% 0 ip address MountainViewRAC1 Up Normal 2.17 GB 33.33% 56713727820156410577229101238628035242 ip address MountainViewRAC1 Up Normal 2.41 GB 33.33% 113427455640312821154458202477256070485 ip address Rackspace RAC1 Up Normal 3.9 GB 13.33% 136112946768375385385349842972707284580 The following query runs quickly on all nodes except 1 MountainView node: select * from Access_Log where row_loaded = 0 limit 1; There is a secondary index on row_loaded. The query usually doesn't complete (but sometimes does) on the bad node and returns very quickly on all other nodes. I've upping the rpc timeout to a full minute (rpc_timeout_in_ms: 6) in the yaml, but it still often doesn't complete in a minute. It seems just as likely to complete and takes about the same amount of time whether the limit is 1, 100 or 1000. Thanks for any help, Chris -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: tombstones problem with 1.0.8
Removing expired columns actually requires two compaction passes: one to turn the expired column into a tombstone; one to remove the tombstone after gc_grace_seconds. (See https://issues.apache.org/jira/browse/CASSANDRA-1537.) Perhaps CASSANDRA-2786 was causing things to (erroneously) be cleaned up early enough that this helped you out in 0.8.2? On Wed, Mar 21, 2012 at 8:38 PM, Ross Black ross.w.bl...@gmail.com wrote: Hi, We recently moved from 0.8.2 to 1.0.8 and the behaviour seems to have changed so that tombstones are now not being deleted. Our application continually adds and removes columns from Cassandra. We have set a short gc_grace time (3600) since our application would automatically delete zombies if they appear. Under 0.8.2, the tombstones remained at a relatively constant number. Under 1.0.8, the tombstones have been continually increasing so that they exceed the size of our real data (at this stage we have over 100G of tombstones). Even after running a full compact the new compacted SSTable contains a massive number of tombstones, many that are several weeks old. Have I missed some new configuration option to allow deletion of tombstones? I also noticed that one of the changes between 0.8.2 and 1.0.8 was https://issues.apache.org/jira/browse/CASSANDRA-2786 which changed code to avoid dropping tombstones when they might still be needed to shadow data in another sstable. Could this be having an impact since we continually add and remove columns even while a major compact is executing? Thanks, Ross -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Cannot start cassandra node anymore
Hi Carlo, Can you post steps to reproduce over on https://issues.apache.org/jira/browse/CASSANDRA-3819 ? We have tried and failed to cause this problem. On Thu, Jan 26, 2012 at 6:24 AM, Carlo Pires carlopi...@gmail.com wrote: I found out this is related to schema change. Happens *every time* I create drop and new CF with composite types. As workaround I: * never stop all nodes together To stop a node: * repair and compact a node before stopping it * stop and start it again * if it started fine good if not, remove all data and restart the node (and wait...) 2012/1/25 aaron morton aa...@thelastpickle.com There is someone wrong with the way a composite type value was serialized. The length of a part on disk is not right. As a work around remove the log file, restart and then repair the node. How it got like that is another question. What was the schema change ? Cheers -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Final buffer length 4690 to accomodate data size of 2347 for RowMutation error caused node death
Thanks, Thomas. Row cache/CLHCP confirms our suspected culprit. We've committed a fix for 1.0.9. On Wed, Mar 7, 2012 at 11:08 AM, Thomas van Neerijnen t...@bossastudios.com wrote: Sorry, for the delay in replying. I'd like to stress that I've been working on this cluster for many months and this was the first and so far last time I got this error so I couldn't guess how to duplicate. Sorry I can't be more help. Anyways, here's the details requested: Row caching is enabled, at the time the error occurred using ConcurrentLinkedHashCacheProvider. It's the Apache packaged version with JNA pulled in as a dependency when I installed so yes. We're using Hector 1.0.1. I'm not sure what was happening at the time the error occured altho the empty super columns are expected, assuming my understanding of super columns being deleted is correct, which is to say if I delete a super column from a row it'll tombstone it and delete the data. The schema for PlayerCity is as follows: create column family PlayerCity with column_type = 'Super' and comparator = 'UTF8Type' and subcomparator = 'BytesType' and default_validation_class = 'BytesType' and key_validation_class = 'BytesType' and rows_cached = 400.0 and row_cache_save_period = 0 and row_cache_keys_to_save = 2147483647 and keys_cached = 20.0 and key_cache_save_period = 14400 and read_repair_chance = 1.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and row_cache_provider = 'ConcurrentLinkedHashCacheProvider' and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'; On Fri, Feb 24, 2012 at 10:07 PM, Jonathan Ellis jbel...@gmail.com wrote: I've filed https://issues.apache.org/jira/browse/CASSANDRA-3957 as a bug. Any further light you can shed here would be useful. (Is row cache enabled? Is JNA installed?) On Mon, Feb 20, 2012 at 5:43 AM, Thomas van Neerijnen t...@bossastudios.com wrote: Hi all I am running the Apache packaged Cassandra 1.0.7 on Ubuntu 11.10. It has been running fine for over a month however I encountered the below error yesterday which almost immediately resulted in heap usage rising quickly to almost 100% and client requests timing out on the affected node. I gave up waiting for the init script to stop Cassandra and killed it myself after about 3 minutes, restarted it and it has been fine since. Anyone seen this before? Here is the error in the output.log: ERROR 10:51:44,282 Fatal exception in thread Thread[COMMIT-LOG-WRITER,5,main] java.lang.AssertionError: Final buffer length 4690 to accomodate data size of 2347 (predicted 2344) for RowMutation(keyspace='Player', key='36336138643338652d366162302d343334392d383466302d356166643863353133356465', modifications=[ColumnFamily(PlayerCity [SuperColumn(owneditem_1019 []),SuperColumn(owneditem_1024 []),SuperColumn(owneditem_1026 []),SuperColumn(owneditem_1074 []),SuperColumn(owneditem_1077 []),SuperColumn(owneditem_1084 []),SuperColumn(owneditem_1094 []),SuperColumn(owneditem_1130 []),SuperColumn(owneditem_1136 []),SuperColumn(owneditem_1141 []),SuperColumn(owneditem_1142 []),SuperColumn(owneditem_1145 []),SuperColumn(owneditem_1218 [636f6e6e6563746564:false:5@1329648704269002,63757272656e744865616c7468:false:3@1329648704269006,656e64436f6e737472756374696f6e54696d65:false:13@1329648704269007,6964:false:4@1329648704269000,6974656d4964:false:15@1329648704269001,6c61737444657374726f79656454696d65:false:1@1329648704269008,6c61737454696d65436f6c6c6563746564:false:13@1329648704269005,736b696e4964:false:7@1329648704269009,78:false:4@1329648704269003,79:false:3@1329648704269004,]),SuperColumn(owneditem_133 []),SuperColumn(owneditem_134 []),SuperColumn(owneditem_135 []),SuperColumn(owneditem_141 []),SuperColumn(owneditem_147 []),SuperColumn(owneditem_154 []),SuperColumn(owneditem_159 []),SuperColumn(owneditem_171 []),SuperColumn(owneditem_253 []),SuperColumn(owneditem_422 []),SuperColumn(owneditem_438 []),SuperColumn(owneditem_515 []),SuperColumn(owneditem_521 []),SuperColumn(owneditem_523 []),SuperColumn(owneditem_525 []),SuperColumn(owneditem_562 []),SuperColumn(owneditem_61 []),SuperColumn(owneditem_634 []),SuperColumn(owneditem_636 []),SuperColumn(owneditem_71 []),SuperColumn(owneditem_712 []),SuperColumn(owneditem_720 []),SuperColumn(owneditem_728 []),SuperColumn(owneditem_787 []),SuperColumn(owneditem_797 []),SuperColumn(owneditem_798 []),SuperColumn(owneditem_838 []),SuperColumn(owneditem_842 []),SuperColumn(owneditem_847 []),SuperColumn(owneditem_849 []),SuperColumn(owneditem_851 []),SuperColumn(owneditem_852 []),SuperColumn(owneditem_853 []),SuperColumn(owneditem_854 []),SuperColumn(owneditem_857 []),SuperColumn(owneditem_858 []),SuperColumn(owneditem_874 []),SuperColumn(owneditem_884 []),SuperColumn(owneditem_886