Re: Questions about using MD5 encryption with SimpleAuthenticator
On Wed, 18 May 2011 17:16:28 -0700 Sameer Farooqui cassandral...@gmail.com wrote: SF But even SSL/TLS is subject to attacks from tools like SSLSNIFF: SF http://www.thoughtcrime.org/software/sslsniff For perfect security, unplug the server and remove the hard drive. Ted
Re: Questions about using MD5 encryption with SimpleAuthenticator
On Tue, 17 May 2011 15:52:22 -0700 Sameer Farooqui cassandral...@gmail.com wrote: SF Would still be nice though to use the bcrypt hash over MD5 for stronger SF security. I used MD5 when I proposed SimpleAuthenticator for two reasons: 1) SimpleAuthenticator is supposed to be a demo of the authentication interface. It can be used for testing and trivial setups, but I wouldn't use it in production. So it's meant to get you going easily, not to serve you long-term. 2) MD5 is built into Java. At the time, bcrypt and SHA-* were not. I used MD5 only so the passwords are not stored in the clear, not to provide production-level security. You should consider carefully the implications of storing passwords in a file on a database server, no matter how they are encrypted. It would be better to write a trivial AD/LDAP/etc. authenticator that fits your specific needs and doesn't rely on a local file. Ted
CQL transport (was: CQL DELETE statement)
On Tue, 19 Apr 2011 00:21:44 +0100 Courtney Robinson sa...@live.co.uk wrote: CR Cool... Okay, the plan is to eventually not use thrift underneath, CR for the CQL stuff right? Once this is done and the new transport is CR in place, or evening while designing the new transport, is this not CR something that's worth looking into again? I think it'd be a nice CR feature. I'm assuming your question was tangential and not in the sense that fixing the transport will fix the reported issue. There's https://issues.apache.org/jira/browse/CASSANDRA-2478 for a custom non-Thrift protocol. Most of the Cassandra developers feel HTTP+JSON or XML is inadequate for this purpose; while this may be true for some cases, it's also true that for many of the end users HTTP+JSON or XML is easier to support and use from a client. So I hope eventually HTTP as the transport and JSON or XML as the serialization format are at least an option. Ted
Re: JVM Options for Production
On Mon, 14 Jun 2010 16:01:57 -0700 Anthony Molinaro antho...@alumni.caltech.edu wrote: AM Now I would assume that for 'production' you want to remove AM-ea AM and AM-XX:+HeapDumpOnOutOfMemoryError AM as well as adjust -Xms and Xmx accordingly, but are there any others AM which should be tweaked? Is there actually a recommended production AM set of values or does it very greatly from installation to installation? I brought this up as well here: http://thread.gmane.org/gmane.comp.db.cassandra.user/2083/focus=2093 Ted
Re: Perl/Thrift/Cassandra strangeness
On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook jsh...@gmail.com wrote: JS The point is to get the last super-column. ... JS Is the Perl Thrift client problematic, or is there something else that JS I am missing? Try Net::Cassandra::Easy; if it does what you want, look at the debug output or trace the code to see how the predicate is specified so you can duplicate that in your own code. In general yes, the Perl Thrift interface is problematic. It's slow and semantically inconsistent. Ted
Re: Apache Cassandra and Net::Cassandra::Easy
On Tue, 11 May 2010 09:40:02 -0700 Scott Doty sc...@corp.sonic.net wrote: SD I'm trying to wrap my head around Net::Cassandra::Easy, and it's making SD me cross-eyed. SD My prototype app can be seen here: SD http://bito.ponzo.net/Hatchet/ SD The idea is to index logfiles by various keys, using Cassandra's extreme SD write speed to keep up with the millions of lines of logfile we deal SD with every day. Pretty standard stuff, but I seem to be having trouble SD getting the software to do its thing. SD It says: SD _[/home/cass/Hatchet.development/Speed_trial]_(c...@bito)_ SD $ ./2hk_idx_parser_speed_trial.pl SD $VAR1 = 'Can\'t use string (0) as a SCALAR ref while strict refs in SD use at SD /usr/local/lib/perl5/site_perl/5.10.0/Net/GenThrift/Thrift/BinaryProtocol.pm SD line 376. SD '; The latest N::C::Easy will not work with Cassandra 0.6.x, the only target is SVN trunk. I can't discover the API version on the server so there's no way to anticipate such breakage as you see (I suspect it's due to API mismatch). The Cassandra developers haven't addressed https://issues.apache.org/jira/browse/CASSANDRA-972 which can be used to provide all that information over Thrift as well as in the log files. Can you give an example of a failing script, with any details on the server version and keyspace/CF setup? Without source code I'm only guessing. Finally, if you're looking for speed, N::C::Easy is not a good fit. The Perl implementation of Thrift is significantly slower (5-20x in my benchmarks) than the equivalent code in Java. This is outside of N::C::Easy so there's not much I can do about it (see https://issues.apache.org/jira/browse/THRIFT-775 which I opened recently, and which can perhaps help the performance). Ted
Re: how to get apache cassandra version with thrift client ?
On Tue, 27 Apr 2010 19:06:11 -0500 Jonathan Ellis jbel...@gmail.com wrote: JE On Mon, Apr 26, 2010 at 9:46 PM, Shuge Lee shuge@gmail.com wrote: I know I can get thrift API version. However, I writing a CLI for Cassandra in Python with readline support, and it will supports one-key deploy/upgrade cassandra+thrift remote, I need to get ApacheCassandra version to make sure it has deploy successfully. JE You'll want to create a ticket at JE https://issues.apache.org/jira/browse/CASSANDRA to add that, then. I think this is related to https://issues.apache.org/jira/browse/CASSANDRA-972 (meaning that the Cassandra version, Thrift interface version, and SVN rev would be useful both in the logs and through Thrift). Ted
Re: [RELEASE] 0.6.0
On Wed, 14 Apr 2010 13:09:13 -0500 Ted Zlatanov t...@lifelogs.com wrote: TZ On Wed, 14 Apr 2010 12:23:19 -0500 Eric Evans eev...@rackspace.com wrote: EE On Wed, 2010-04-14 at 10:16 -0500, Ted Zlatanov wrote: Can it support a non-root user through /etc/default/cassandra? I've been patching the init script myself but was hoping this would be standard. EE It's the first item on debian/TODO, but, you know, patches welcome and EE all that. TZ The appended patch has been sufficient for me. Eric, do you need me to open a ticket for this, too, or is what I posted sufficient? Thanks Ted
Re: Time-series data model
On Wed, 14 Apr 2010 15:02:29 +0200 Jean-Pierre Bergamin ja...@ractive.ch wrote: JB The metrics are stored together with a timestamp. The queries we want to JB perform are: JB * The last value of a specific metric of a device JB * The values of a specific metric of a device between two timestamps t1 and JB t2 Make your key devicename-metricname-MMDD-HHMM (with whatever time sharding makes sense to you; I use UTC by-hours and by-day in my environment). Then your supercolumn is the collection time as a LongType and your columns inside the supercolumn can express the metric in detail (collector agent, detailed breakdown, etc.). If you want your clients to discover the available metrics, you may need to keep an external index. But from your spec that doesn't seem necessary. Ted
Re: [RELEASE] 0.6.0
On Tue, 13 Apr 2010 15:54:39 -0500 Eric Evans eev...@rackspace.com wrote: EE I leaned into it. An updated package has been uploaded to the Cassandra EE repo (see: http://wiki.apache.org/cassandra/DebianPackaging). Thank you for providing the release to the repository. Can it support a non-root user through /etc/default/cassandra? I've been patching the init script myself but was hoping this would be standard. Thanks Ted
Re: writes to Cassandra failing occasionally
On Thu, 8 Apr 2010 10:56:55 -0500 Jonathan Ellis jbel...@gmail.com wrote: JE is N:C:E possibly ignoring thrift exceptions? I always pass them down to the user. The user is responsible for wrapping with eval(). Ted
Re: writes to Cassandra failing occasionally
On Thu, 08 Apr 2010 12:16:34 -0700 Mike Gallamore mike.e.gallam...@googlemail.com wrote: MG Hopefully my fix helps others. I imagine it is something you'll run MG into regardless of the language/interface you use, for example I'm MG pretty sure that the C/C++ time function truncates values too. I'd MG recommend anyone using time to generate your timestamp: be careful MG that your timestamp is always the same length (or at least that the MG sub components that you are concatenating are the length you expect MG them to be). This was a Perl-related bug so I doubt other will see it. It's really caused by the fact that 32-bit Perl doesn't have a native 64-bit pack/unpack function so I'm using the Bit::Vector wrappers and consequently passing around Longs as strings. MG I've written a patch that zero pads the numbers. I've attached it to MG this post but encase attachments don't come through on this MG mailinglist here is the body: Thanks so much for catching this. I didn't notice it at all (it works 90% of the time!). I uploaded N::C::Easy 0.10 to CPAN with the fix you proposed, so now timestamps are produced correctly. Ted
Re: writes to Cassandra failing occasionally
On Thu, 08 Apr 2010 11:50:38 -0700 Mike Gallamore mike.e.gallam...@googlemail.com wrote: MG Yes I agree single threaded is probably not the best. I wonder how MG much of a performance hit it is on a single CPU machine though? I MG guess I still would be blocking on ram writes but isn't like there is MG multiple CPUs I need to keep busy or anything. Cassandra may have to load data from disk for a particular query but another may already be in memory. A third may cause a hit on another cluster node. So if you issue queries serially you'll see performance drop off with the total number of queries because they are dependent on each other's performance, while the distribution of the performance of independent parallel queries will have skew and kurtosis much closer to a normal distribution. In other words, your slowest (or unluckiest) queries are less damaging when you issue them in parallel. On the client side you still have slow serialization/deserialization and not much can be done about that. Ted
Re: writes to Cassandra failing occasionally
On Thu, 08 Apr 2010 12:53:48 +0100 Philip Jackson p...@shellarchive.co.uk wrote: PJ At Wed, 07 Apr 2010 13:19:26 -0700, PJ Mike Gallamore wrote: I have writes to cassandra that are failing, or at least a read shortly after a write is still getting an old value. I realize Cassandra is eventually consistent but this system is a single CPU single node with consistency level set to 1, so this seems odd to me. PJ I'm having this problem too (see my post the other day). I use N::C PJ but generate timestamps in the same way as N::C::E, I've tested that PJ each is smaller than the next so I'm wondering if I'm barking up the PJ wrong tree. PJ If you figure out what's going on please do post back here, I'll do PJ the same. Please put together a test that runs against the default keyspace (Keyspace1) or give me your configuration plus a test. At the very least, set $Net::Cassandra::Easy::DEBUG to 1 and look at the timestamps it's generating in the Thrift requests. By default N::C::Easy uses Moose to provide timestamps through a double-wrapped (so it's called every time) sub: has timestamp = ( is = 'ro', isa = 'CodeRef', default = sub { sub { join('', gettimeofday()) } } ); This has worked for me but it certainly could be the problem. Ted
Re: writes to Cassandra failing occasionally
On Wed, 07 Apr 2010 13:19:26 -0700 Mike Gallamore mike.e.gallam...@googlemail.com wrote: MG As an aside I motified some other code to use Net::Cassandra instead MG of Net::Cassandra::Easy and noticed that it seems to run 3-4X MG slower. Both aren't stunningly fast. The test clients are running on MG the same machine as Cassandra, and I'm only getting somewhere between MG 100-400 (huge variance) with N::C::Easy and 30-90 with N::C. This test MG is writing key value pairs, with the keys being an incrementing MG numbber, and the values being a log line from one of our systems (~200 MG character string). I'm surprised there is such a huge difference in MG speed between the two modules and that the transactions per second are MG so low even on my 3.2Ghz P4 2GB RAM box. I tried dropping the MG consistency level down to zero but it had a negligible affect. First of all, Thrift and the way it's implemented in pure Perl (Inline::C or XS would have been much better, plus the data structures are horrible) are IMO the most annoying thing about working with Cassandra. I proposed a pluggable API mechanism so users don't have to depend on Thrift but the proposal was rejected, so for now Thrift (with the crash-on-demand feature) is the only actively developed Cassandra API. Avro is supposed to be happening soon and I look forward to that. You should benchmark your code; make sure you're comparing apples to apples. N::C::Easy wraps the operations for you, always using multigets and mutations on the backend. I don't know how your Net::Cassandra test is implemented. It may be you're making multiple requests when you only need one. But more importantly, unless you fork multiple processes you won't be winning any speed races. Use Tie::ShareLite, for example, to synchronize your data structures through shared memory. If you can put together benchmarks that run against the default (Keyspace1) configuration, I can try to optimize things. I won't be rewriting the Thrift side, so it will still be slow on serialize/deserialize operations, but everything else will be fixed if it's suboptimal. Ted
Re: Net::Cassandra::Easy deletion failed
On Tue, 06 Apr 2010 14:14:55 -0700 Mike Gallamore mike.e.gallam...@googlemail.com wrote: MG Great it works. Or at least the Cassandra/thrift part seems to MG work. My tests don't pass but I think it is actual logic errors in the MG test now, the column does appear to be getting cleared okay with the MG new version of the module. Thanks. If you can make a self-contained script that runs against Cassandra, or even better, a patch for test.pl so it will be available for everyone, I'll gladly look at it. Also don't forget to set $Net::Cassandra::Easy::DEBUG to 1 so you can see the Thrift structures it builds. You may need a slightly older or 0.6 checkout because of trunk breakage I saw reported on the devel list just now. Ted
Re: Net::Cassandra::Easy deletion failed
On Tue, 06 Apr 2010 11:07:03 -0700 Mike Gallamore mike.e.gallam...@googlemail.com wrote: MG Seems to be internal to java/cassandra itself. MG I have some tests and I want to make sure that I have a clean slate MG each time I run the test. Clean as far as my code cares is that MG value is not defined. I'm running bin/cassandra -f with the MG default install/options. So at the beginning of my test I run: Mike, you can submit bugs and questions directly to me, here, or through http://rt.cpan.org (the CPAN bug tracker). It's a good idea to test an operation from the CLI that comes with Cassandra to make sure the problem is not with the Net::Cassandra::Easy module. Also, if you set $Net::Cassandra::Easy::DEBUG to 1, you'll see the actual Thrift objects that get constructed. In this case (N::C::Easy 0.08) I was constructing a super_column parameter which was wrong. MG $rc = $c-mutate([$key], family = 'Standard1', deletions = { byname = ['value']}); ... MG Anyone have any ideas what I'm doing wrong? The value field is just a MG json encoded digit so something like (30) not a real supercolumn but MG the Net::Cassandra::Easy docs didn't have any examples of removing a MG non supercolumns data. Really what I'd like to do is delete the whole MG row, but again I didn't find any examples of how to do this. It's a bug in N::C::Easy. I fixed it in 0.09 so it will work properly with: $rc = $c-mutate([$key], family = 'Standard1', deletions = { standard = 1, byname = ['column1', 'column2']}); AFAIK I can't specify delete all columns in a non-super CF using Deletions so byname is required (I end up filling the column_names field in the predicate). OTOH I can just delete a SuperColumn so the above is possible in a super CF. The docs and tests were updated as well. Let me know if you have problems; it worked for me. In the next release I'll update cassidy.pl to work with non-super CFs as well. Sorry for the inconvenience. Thanks Ted
Re: Net::Cassandra::Easy deletion failed
On Tue, 06 Apr 2010 13:24:45 -0700 Mike Gallamore mike.e.gallam...@googlemail.com wrote: MG Thanks for the reply. The newest version of the module I see on CPAN MG is 0.08b. I actually had 0.07 installed and am using 0.6beta3 for MG cassandra. Is there somewhere else I should look for the 0.09 version MG of the module? I'll also upgrade to the release candidate version of MG Cassandra and see if that helps. It takes a few hours for CPAN to update all its mirrors. I'm attaching 0.09 here since it's a tiny tarball. Ted Net-Cassandra-Easy-0.09.tar.gz Description: Binary data
Re: multinode cluster wiki page
On Sat, 3 Apr 2010 13:52:22 -0700 Benjamin Black b...@b3k.us wrote: BB What happens if the IP I get back is for a seed that happens to be BB down right then? And then that IP is cached locally by my resolver? You have to set the TTL to be the right number of seconds for your environment. With tinydns on a dedicated subdomain, even an old machine could support really short TTLs. BB There is certainly a tempting conceptual simplicity to using DNS, I BB just don't think the reality is that simple nor is it for the trade in BB predictability, for me. IMO, this is better done either through BB automation to generate the configs (how I do it; I just update BB chef-server) or through a service like ZK (how I might do it in the BB future, in combination with automation). DNS tends to be everywhere and easily configurable, so it's a pretty good lowest common denominator. I think Zeroconf AKA mDNS/DNS-SD is a good alternative to simple DNS RR for many environments and I will eventually propose a contrib plugin for Cassandra that provides it if no one else gets to it first (we discussed this previously). Modern Linux systems support Zeroconf AKA mDNS/DNS-SD through Avahi. Ted
Re: multinode cluster wiki page
On Mon, 5 Apr 2010 13:10:38 -0500 Brandon Williams dri...@gmail.com wrote: BW 2010/4/5 Ted Zlatanov t...@lifelogs.com It would be nice if Cassandra looked at all the available interfaces and selected the one whose reverse DNS lookup returned .*cassandra.* (or some keyword the user provided). In other words, when you have eth0 = address X, reverse = 67.frontend.com eth1 = address Y, reverse = cassandra-67.backend.com eth1 should look better. So maybe ListenAddress could support this in the configuration somehow, as a string spec or a ListenAddressPreferReverse option. That would let those of us with multiple interfaces use the exact same config everywhere. BW You can already accomplish this. Setup /etc/hosts correctly and leave BW ListenAddress blank. Thanks, that's a much better solution. Ted getAddressFromNameService
Net::Cassandra::Easy Perl interface (with cassidy.pl CLI) 0.08
You can find version 0.08 of the Net::Cassandra::Easy Perl module at: http://search.cpan.org/search?query=cassandra+easymode=all This version comes with cassidy.pl, a command-line interface that supports tab-completion. It's not finished (no docs yet, that's a TODO) but in its current form it will: - autocomplete command name - autocomplete family name - autocomplete key name (when possible, this is a TODO) - autocomplete supercolumn name - parse and insert LongTypes correctly for autocompletion and elsewhere - limit gets to 100 or less - doesn't handle commas or spaces in names (TODO) - doesn't handle non-Super CFs (TODO) The Long autocompletion is clever: given 100 it will generate ranges of 1000 to 1009, 1 to 10099, etc. so you'll get back the supercolumns that start with 100 in decimal. Examples of queries: ins Super1 testrow testcolumn key1=value1 # insert supercolumn testcolumn with some data get Super1 testrow testcolumn,-2 # get testcolumn and the last 2 SCs (prints in a parseable format) del Super1 testrow testcolumn # delete testcolumn keys Super1 # get the keys desc # describe the keyspace The queries can also be passed from the command line, e.g. ./cassidy.pl -server myserver -port 9160 -keyspace Keyspace1 'query1' 'query2' I'm using it internally but thought perhaps it will be useful to others. The autocompletion is especially handy. Ted
Re: How reliable is cassandra?
On Mon, 29 Mar 2010 10:31:06 -0700 Matthew Stump mrevilgn...@gmail.com wrote: MS Am I crazy to want to switch our server's primary data store from MS postgres to cassandra? This is a system used by banks and MS governments to store crypto keys which absolutely can not be lost. Run a test pilot for N months (depending on internal factors, N can be 3-12). I think you'll find out more that way than by asking people who have not seen your environment, your data, or your code. Ted
Re: Storing large blobs
On Wed, 17 Mar 2010 22:42:13 -0400 Carlos Sanchez carlos.sanc...@riskmetrics.com wrote: CS We could have blob as large as 50mb compressed (XML compresses quite CS well). Typical documents we would deal with would be between 500K CS and 3MB When just starting to use Cassandra I had serious issues with 0.5 and blobs (compressed JSON) over 500 MB, but it was because of the heap size and not something inherently broken in Cassandra. Ted
Re: renaming a SuperColumn
On Thu, 18 Mar 2010 19:26:06 +0100 Sylvain Lebresne sylv...@yakaz.com wrote: SL Given how Cassandra works, I don't think that the server can do much SL better than the read, write, delete your client already do SL (basically everything is immutable, you only 'add' new versions). As SL this cannot be done efficiently (I can be wrong on that but if so, I SL *really* would be interested to know why), a model that require a SL lot of renames of SuperColumn is probably not a super good fit for SL Cassandra (I'm not sure why you need a lot of rename though, maybe SL you really have no choice). So I lean in the same direction as SL Jonathan. Adding an API call that 'promote' an operation not very SL efficient is not a good idea. Thanks for explaining, Sylvain. I could go into details about my model but it's not so important (it's evolving as I find the best fit between the model and Cassandra). As long as I know Cassandra is not a good fit for this kind of operation I can design a better data model. Ted
Net::Cassandra::Easy Perl interface to Cassandra 0.03
You can find version 0.03 of the Net::Cassandra::Easy Perl module at: http://search.cpan.org/search?query=cassandra+easymode=all (it may show 0.021 but that's the same as 0.03, I just uploaded with the incorrect version number accidentally) The docs explain how to use it with examples for the common cases. There are only three N::C::Easy methods: connect(), get(), and mutate() so it's pretty easy to learn and use, from a one-liner or from a large program. Please let me know if you have bugs or suggestions. N::C::Easy will track cassandra.thrift closely but also will be able to switch to Avro or another API easily since it avoids Thrift-specific code on the user side as much as possible. You can also use the Net::Cassandra module by Leon Brocard, of course. It works fine and is much more Thrift-oriented. Ted
Re: bundled mutations (Deletion+insertion of the same SuperColumn)
On Mon, 15 Mar 2010 16:45:49 -0400 Jake Luciani jak...@gmail.com wrote: JL On Mar 15, 2010, at 4:41 PM, Ted Zlatanov t...@lifelogs.com wrote: Can there be any assurance that if I specify a Deletion and an insertion for a specific SuperColumn in the same batch_mutate() call, they will happen atomically? JL Timestamps. Sorry to be dense but this is not really explained in http://wiki.apache.org/cassandra/API or anywhere else on the wiki so I need to ask for more help. I looked in the archives and threads like why does remove need a timestamp? talk about this but only about the remove or get operations in isolation, not together. I understand removal timestamps need to be larger than the old data's timestamps. The API page section on Deletion says it operates on Columns matching the specified timestamp but it's not clear what kind of match this is ('' or '='). If my mutation is: Deletion(hello, T1) + ... insertion wrapper(SuperColumn(hello, new Column(x, y, T2))) What should T1 and T2 be, assuming all existing columns in hello have a timestamp of a minute ago (guaranteed lower than T1 and T2 because there's only one writer)? Should I make T1 the current time minus one and T2 the current time? My goal is to ensure that hello is never empty to readers. I'm trying to avoid even a 1-millisecond gap because I have hundreds of fast readers and there's a good chance they'll hit it. Thanks Ted