Re: Starting Cassandra Fauna
Hi, Can anyone please list steps to install and run cassandra in centos. It can help me to follow and check where i missed and run correctly. Also, if i wanted to insert some data programmatically, where i need to do place the code in Fauna.Can anyone help me on this? On Mon, Apr 12, 2010 at 10:36 PM, Ryan King r...@twitter.com wrote: I'm guessing you missed the ant ivy-retrieve step. We're planning on releasing a new gem today that should fix this issue. -ryan On Mon, Apr 12, 2010 at 3:30 AM, Nirmala Agadgar nirmala...@gmail.com wrote: Hi, Yes, used only master. i downloaded the tar file and placed in cassandra folder and run again cassandra_helper cassandra now i am getting Error: Exception thrown by the agent : java.net.MalformedURLException: Local host name when set hostname to localhost or 127.0.0.1 i get Exception in thread main java.lang.NoClassDefFoundError: org/apache/log4j/Logger at org.apache.cassandra.thrift.CassandraDaemon.clinit(CassandraDaemon.java:55) how to solve this? Can anyone tell steps to run cassandra or config to done? - Nirmala On Sat, Apr 10, 2010 at 10:48 PM, Jeff Hodges jhod...@twitter.com wrote: Did you try master? We fixed this around the 7th, but haven't made a release yet. -- Jeff On Sat, Apr 10, 2010 at 10:10 AM, Nirmala Agadgar nirmala...@gmail.com wrote: Hi, I tried to dig in problem and found 1) DIST_URL is pointed to http://apache.osuosl.org/incubator/cassandra/0.6.0/apache-cassandra-0.6.0-beta2-bin.tar.gz and it has no resource in it.( in Rakefile of Cassandra Gem) DIST_URL = http://apache.osuosl.org/incubator/cassandra/0.6.0/apache-cassandra-0.6.0-beta2-bin.tar.gz 2) It does not executes after sh tar xzf #{DIST_FILE} Can anyone help on this problem? Where the tar file should be downloaded? On Fri, Apr 9, 2010 at 3:28 AM, Jeff Hodges jhod...@twitter.com wrote: While I wasn't able to reproduce the error, we did have another pop up. I think I may have actually fixed your problem the other day. Pull the latest master from fauna/cassandra and you should be good to go. -- Jeff On Thu, Apr 8, 2010 at 10:51 AM, Ryan King r...@twitter.com wrote: Yeah, this is a known issue, we're working on it today. -ryan On Thu, Apr 8, 2010 at 10:31 AM, Jonathan Ellis jbel...@gmail.com wrote: Sounds like it's worth reporting on the github project then. On Thu, Apr 8, 2010 at 11:53 AM, Paul Prescod pres...@gmail.com wrote: On Thu, Apr 8, 2010 at 9:49 AM, Jonathan Ellis jbel...@gmail.com wrote: cassandra_helper does a bunch of magic to set things up. looks like the extract a private copy of cassandra 0.6 beta2 part of the magic is failing. you'll probably need to manually attempt the un-tar to figure out why it is bailing. Yes, I had the same problem. I didn't dig into it, but perhaps all users have this problem now. Paul Prescod
Re: GC options
FYI, G1 has been in 1.6 since u14. 2010/4/13 Peter Schüller sc...@spotify.com: I'm working on getting our latency as consistent as possible, and the gc likes to kick off 60+ms periods of unavailability for a node, which for my application leads to a reasonable number of timed out requests. Outside of the gc event, we get good responses. I'm happy with reduced throughput for shorter pauses, so I'm going to do the standard jvm gc tuning guide[0] for short pauses, curious if anyone else has gone down this path and gotten gc pauses consistent and low or if what's in bin/cassandra.in.sh is basically the best I should expect. (Anyone tried jrockit?) If your situation is such that you are willing to use the unreleased JDK 1.7 and G1GC (still being marked as experimental and may still be a stability concern and since we are talking about storing data that probably means conservatism is called for) you can try that. It offers some more direct control over the target GC pause times, although does not provide guarantees. A potential starting point of VM options may be: -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:GCPauseIntervalMillis=15 And maybe: -XX:G1ConfidencePercent=100 And maybe (not sure of current status but there used to be a known bug when enabled): -XX:+G1ParallelRSetUpdatingEnabled -XX:+G1ParallelRSetScanningEnabled -- / Peter Schuller aka scode
Re: GC options
Got it, thanks 2010/4/13 Peter Schüller sc...@spotify.com: FYI, G1 has been in 1.6 since u14. Yes, but (last time I checked) in a considerably older form. The JDK 1.7 one is more mature. -- / Peter Schuller aka scode
Re: History values
I am new to using cassandra. In the documentation I have read, understand, that as in other non-documentary databases, to update the value of a key-value tuple, this new value is stored with a timestamp different but without entirely losing the old value. I wonder, as I can restore the historic values that have had a particular field. You can't. Upon update, the old value is lost. From a technical standpoint, it is true that this old value is not deleted (from disk) right away, but it is deleted eventually by compaction (and you don't really control when the compactions occur). -- Sylvain
Re: History values
Ok, thank you very much for your reply. I have another question may seem stupid ... Cassandra has a graphical console, such as mysql for SQL databases? Regards!
Re: History values
I'm also new to cassandra and about the same question I asked me if using super columns with one key per version was feasible. Is there limitations to this use case (or better practices)? Thank you and best regards, Bertil Chapuis On 14 April 2010 09:45, Sylvain Lebresne sylv...@yakaz.com wrote: I am new to using cassandra. In the documentation I have read, understand, that as in other non-documentary databases, to update the value of a key-value tuple, this new value is stored with a timestamp different but without entirely losing the old value. I wonder, as I can restore the historic values that have had a particular field. You can't. Upon update, the old value is lost. From a technical standpoint, it is true that this old value is not deleted (from disk) right away, but it is deleted eventually by compaction (and you don't really control when the compactions occur). -- Sylvain
Re: History values
I think it is still to young, and have to wait or write your self the graphical console, at least, I don't find any until now. On Wed, Apr 14, 2010 at 10:04 AM, Bertil Chapuis bchap...@gmail.com wrote: I'm also new to cassandra and about the same question I asked me if using super columns with one key per version was feasible. Is there limitations to this use case (or better practices)? Thank you and best regards, Bertil Chapuis On 14 April 2010 09:45, Sylvain Lebresne sylv...@yakaz.com wrote: I am new to using cassandra. In the documentation I have read, understand, that as in other non-documentary databases, to update the value of a key-value tuple, this new value is stored with a timestamp different but without entirely losing the old value. I wonder, as I can restore the historic values that have had a particular field. You can't. Upon update, the old value is lost. From a technical standpoint, it is true that this old value is not deleted (from disk) right away, but it is deleted eventually by compaction (and you don't really control when the compactions occur). -- Sylvain
Re: History values
On Wed, Apr 14, 2010 at 5:13 PM, Zhiguo Zhang mikewolfx...@gmail.com wrote: I think it is still to young, and have to wait or write your self the graphical console, at least, I don't find any until now. Frankly speaking, I'm OK to be without GUI...But I am really disappointed by those so-called 'documents'. I really prefer to have some more documents in real 'English' and in a more tutorial way. Hope I can write some texts after I managed to understand the current ones. On Wed, Apr 14, 2010 at 10:04 AM, Bertil Chapuis bchap...@gmail.com wrote: I'm also new to cassandra and about the same question I asked me if using super columns with one key per version was feasible. Is there limitations to this use case (or better practices)? Thank you and best regards, Bertil Chapuis On 14 April 2010 09:45, Sylvain Lebresne sylv...@yakaz.com wrote: I am new to using cassandra. In the documentation I have read, understand, that as in other non-documentary databases, to update the value of a key-value tuple, this new value is stored with a timestamp different but without entirely losing the old value. I wonder, as I can restore the historic values that have had a particular field. You can't. Upon update, the old value is lost. From a technical standpoint, it is true that this old value is not deleted (from disk) right away, but it is deleted eventually by compaction (and you don't really control when the compactions occur). -- Sylvain
server crash - how to invertigate
I'm running a 0.6.0 cluster with four nodes and one of them just crashed. The logs all seem normal and I haven't seen anything special in the jmx counters before the crash. I have one client writing and reading using 10 threads and using 3 different column families: KvAds, KvImpressions and KvUsers the client had got a few UnavailableException, TimedOutException and TTransportException but was able to complete the read/write operation by failing over to another available host. I can't tell if the exceptions were from the crashed host or from other hosts in the ring. Any hints how to investigate this are greatly appreciated. So far I'm lost... Here's a snippet from the log just before it went down. It doesn't seem to have anything special in it, everything is INFO level. The only thing that seems a bit strange is that last message: Compacting []. This message usually comes with things inside the [], such as Compacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassdata/data/system/LocationInfo-1-Data.db'),...] but this time it was just empty. However, this is not the only place in the log were I see an empty Compacting []. There are other places and they didn't end up in a crash, so I don't know if it's related. here's the log: INFO [ROW-MUTATION-STAGE:6] 2010-04-14 05:55:07,014 ColumnFamilyStore.java (line 357) KvImpressions has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/outbrain/cassdata/commitlog/CommitLog-1271238432773.log', position=68606651) INFO [ROW-MUTATION-STAGE:6] 2010-04-14 05:55:07,015 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(KvImpressions)@258729366 INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:55:07,015 Memtable.java (line 148) Writing Memtable(KvImpressions)@258729366 INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:55:10,130 Memtable.java (line 162) Completed flushing /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-24-Data.db INFO [COMMIT-LOG-WRITER] 2010-04-14 05:55:10,154 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/outbrain/cassdata/commitlog/CommitLog-1271238049425.log) INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,415 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-16-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,440 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvAds-8-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,454 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvAds-10-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,526 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-5-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,585 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-11-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,602 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvAds-11-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,614 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvAds-9-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,682 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-21-Data.db INFO [COMMIT-LOG-WRITER] 2010-04-14 05:55:52,254 CommitLogSegment.java (line 50) Creating new commitlog segment /outbrain/cassdata/commitlog/CommitLog-1271238952254.log INFO [ROW-MUTATION-STAGE:16] 2010-04-14 05:56:25,347 ColumnFamilyStore.java (line 357) KvImpressions has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/outbrain/cassdata/commitlog/CommitLog-1271238952254.log', position=47568158) INFO [ROW-MUTATION-STAGE:16] 2010-04-14 05:56:25,348 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(KvImpressions)@1955587316 INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:56:25,348 Memtable.java (line 148) Writing Memtable(KvImpressions)@1955587316 INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:56:30,572 Memtable.java (line 162) Completed flushing /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-25-Data.db INFO [COMMIT-LOG-WRITER] 2010-04-14 05:57:26,790 CommitLogSegment.java (line 50) Creating new commitlog segment /outbrain/cassdata/commitlog/CommitLog-1271239046790.log INFO [ROW-MUTATION-STAGE:7] 2010-04-14 05:57:59,513 ColumnFamilyStore.java (line 357) KvImpressions has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/outbrain/cassdata/commitlog/CommitLog-1271239046790.log', position=24265615) INFO [ROW-MUTATION-STAGE:7] 2010-04-14 05:57:59,513 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(KvImpressions)@1617250066 INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:57:59,513 Memtable.java (line 148) Writing Memtable(KvImpressions)@1617250066 INFO [FLUSH-WRITER-POOL:1]
Re: RE : Re: RE : Re: Two dimensional matrices
I'm confused : don't range queries such as the ones we've been discussing require using an orderedpartitionner ? Alright, so distribution depends on your choice of token. Ah yes, I get it now : with a naive orderedpartitioner, the key is associated with the node whose token is the closest numerically-wise and that is where the master replica is located. Yes ? Now let's assume I am using super columns as {X} and columns as {timeFrame}. In time each row will grow very large because X can (very sparsly) go to 2^28 i) does cassandra load all columns everytime it reads a row ? Same question for super column ii) Similarly does it cache all columns in memory ? Now some order of magnitudes, let's say a row is about 20KB and the cluster is running smoothly on low-end servers. There are millions of rows per node. i) If I were to only issue gets on the key, what is the order of magnitude I can expect to reach : 10/s, 100/s, 1000/s or 10.000/s ? ii) If I were to issue a slice on just the keys, does cassandra optimize the gets or does it run every get on the server and then concatenate to send to the client ? iii) is slicing on the columns going to improve the time to get the data on the server side or does it just cut down on network traffic ? Thanks Philippe
Re: History values
The closest is http://github.com/driftx/chiton On Wed, Apr 14, 2010 at 2:57 AM, Yésica Rey yes...@gdtic.es wrote: Ok, thank you very much for your reply. I have another question may seem stupid ... Cassandra has a graphical console, such as mysql for SQL databases? Regards!
Time-series data model
Hello everyone We are currently evaluating a new DB system (replacing MySQL) to store massive amounts of time-series data. The data are various metrics from various network and IT devices and systems. Metrics i.e. could be CPU usage of the server xy in percent, memory usage of server xy in MB, ping response time of server foo in milliseconds, network traffic of router bar in MB/s and so on. Different metrics can be collected for different devices in different intervals. The metrics are stored together with a timestamp. The queries we want to perform are: * The last value of a specific metric of a device * The values of a specific metric of a device between two timestamps t1 and t2 I stumbled across this blog post which describes a very similar setup with Cassandra: https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ This post gave me confidence that what we want is definitively doable with Cassandra. But since I'm just digging into columns and super-columns and their families, I still have some problems understanding everything. Our data model could look in json'isch notation like this: { my_server_1: { cpu_usage: { {ts: 1271248215, value: 87 }, {ts: 1271248220, value: 34 }, {ts: 1271248225, value: 23 }, {ts: 1271248230, value: 49 } } ping_response: { {ts: 1271248201, value: 0.345 }, {ts: 1271248211, value: 0.423 }, {ts: 1271248221, value: 0.311 }, {ts: 1271248232, value: 0.582 } } } my_server_2: { cpu_usage: { {ts: 1271248215, value: 23 }, ... } disk_usage: { {ts: 1271243451, value: 123445 }, ... } } my_router_1: { bytes_in: { {ts: 1271243451, value: 2452346 }, ... } bytes_out: { {ts: 1271243451, value: 13468 }, ... } errors: { {ts: 1271243451, value: 24 }, ... } } } What I don't get is how to created the two level hierarchy [device][metric]. Am I right that the devices would be kept in a super column family? The ordering of those is not important. But the metrics per device are also a super column, where the columns would be the metric values ({ts: 1271243451, value: 24 }), isn't it? So I'd need a super column in a super column... Hm. My brain is definitively RDBMS-damaged and I don't see through columns and super-columns yet. :-) How could this be modeled in Cassandra? Thank you very much James
Re: Time-series data model
first of all I am a new bee by Non-SQL. I try write down my opinions as references: If I were you, I will use 2 columnfamilys: 1.CF, key is devices 2.CF, key is timeuuid how do u think about that? Mike On Wed, Apr 14, 2010 at 3:02 PM, Jean-Pierre Bergamin ja...@ractive.chwrote: Hello everyone We are currently evaluating a new DB system (replacing MySQL) to store massive amounts of time-series data. The data are various metrics from various network and IT devices and systems. Metrics i.e. could be CPU usage of the server xy in percent, memory usage of server xy in MB, ping response time of server foo in milliseconds, network traffic of router bar in MB/s and so on. Different metrics can be collected for different devices in different intervals. The metrics are stored together with a timestamp. The queries we want to perform are: * The last value of a specific metric of a device * The values of a specific metric of a device between two timestamps t1 and t2 I stumbled across this blog post which describes a very similar setup with Cassandra: https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ This post gave me confidence that what we want is definitively doable with Cassandra. But since I'm just digging into columns and super-columns and their families, I still have some problems understanding everything. Our data model could look in json'isch notation like this: { my_server_1: { cpu_usage: { {ts: 1271248215, value: 87 }, {ts: 1271248220, value: 34 }, {ts: 1271248225, value: 23 }, {ts: 1271248230, value: 49 } } ping_response: { {ts: 1271248201, value: 0.345 }, {ts: 1271248211, value: 0.423 }, {ts: 1271248221, value: 0.311 }, {ts: 1271248232, value: 0.582 } } } my_server_2: { cpu_usage: { {ts: 1271248215, value: 23 }, ... } disk_usage: { {ts: 1271243451, value: 123445 }, ... } } my_router_1: { bytes_in: { {ts: 1271243451, value: 2452346 }, ... } bytes_out: { {ts: 1271243451, value: 13468 }, ... } errors: { {ts: 1271243451, value: 24 }, ... } } } What I don't get is how to created the two level hierarchy [device][metric]. Am I right that the devices would be kept in a super column family? The ordering of those is not important. But the metrics per device are also a super column, where the columns would be the metric values ({ts: 1271243451, value: 24 }), isn't it? So I'd need a super column in a super column... Hm. My brain is definitively RDBMS-damaged and I don't see through columns and super-columns yet. :-) How could this be modeled in Cassandra? Thank you very much James
Re: Time-series data model
On Wed, 14 Apr 2010 15:02:29 +0200 Jean-Pierre Bergamin ja...@ractive.ch wrote: JB The metrics are stored together with a timestamp. The queries we want to JB perform are: JB * The last value of a specific metric of a device JB * The values of a specific metric of a device between two timestamps t1 and JB t2 Make your key devicename-metricname-MMDD-HHMM (with whatever time sharding makes sense to you; I use UTC by-hours and by-day in my environment). Then your supercolumn is the collection time as a LongType and your columns inside the supercolumn can express the metric in detail (collector agent, detailed breakdown, etc.). If you want your clients to discover the available metrics, you may need to keep an external index. But from your spec that doesn't seem necessary. Ted
Re: Reading thousands of columns
Yes, I find that get_range_slices takes an incredibly long time return the results. --- Gautam On Tue, Apr 13, 2010 at 2:00 PM, James Golick jamesgol...@gmail.com wrote: Hi All, I'm seeing about 35-50ms to read 1000 columns from a CF using get_range_slices. The columns are TimeUUIDType with empty values. The row cache is enabled and I'm running the query 500 times in a row, so I can only assume the row is cached. Is that about what's expected or am I doing something wrong? (It's from java this time, so it's not ruby thrift being slow). - James
Re: Reading thousands of columns
35-50ms for how many rows of 1000 columns each? get_range_slices does not use the row cache, for the same reason that oracle doesn't cache tuples from sequential scans -- blowing away 1000s of rows worth of recently used rows queried by key, for a swath of rows from the scan, is the wrong call more often than it is the right one. On Tue, Apr 13, 2010 at 1:00 PM, James Golick jamesgol...@gmail.com wrote: Hi All, I'm seeing about 35-50ms to read 1000 columns from a CF using get_range_slices. The columns are TimeUUIDType with empty values. The row cache is enabled and I'm running the query 500 times in a row, so I can only assume the row is cached. Is that about what's expected or am I doing something wrong? (It's from java this time, so it's not ruby thrift being slow). - James
Re: [RELEASE] 0.6.0
On Tue, 13 Apr 2010 15:54:39 -0500 Eric Evans eev...@rackspace.com wrote: EE I leaned into it. An updated package has been uploaded to the Cassandra EE repo (see: http://wiki.apache.org/cassandra/DebianPackaging). Thank you for providing the release to the repository. Can it support a non-root user through /etc/default/cassandra? I've been patching the init script myself but was hoping this would be standard. Thanks Ted
KeysCached and sstable
The inline docs say: ~ The optional KeysCached attribute specifies ~ the number of keys per sstable whose locations we keep in ~ memory in mostly LRU order. There are a few confusing bits in that sentence. 1. Why is keys per sstable rather than keys per column family. If I have 7 SSTable files and I set KeysCached to 1, will I have 7 keys cached? If so, why? What is the logical relationship here? 2. What makes the algorithm mostly LRU rather than just LRU? 3. Is it accurate the say that the goal of the Key Cache is to avoid looking through a bunch off SSTable's Bloom Filters? (how big do the bloom filters grow to...too much to be cached themselves?) I'd like to document the detail. Paul Prescod
Re: Reading thousands of columns
Right - that make sense. I'm only fetching one row. I'll give it a try with get_slice(). Thanks, -James On Wed, Apr 14, 2010 at 7:45 AM, Jonathan Ellis jbel...@gmail.com wrote: 35-50ms for how many rows of 1000 columns each? get_range_slices does not use the row cache, for the same reason that oracle doesn't cache tuples from sequential scans -- blowing away 1000s of rows worth of recently used rows queried by key, for a swath of rows from the scan, is the wrong call more often than it is the right one. On Tue, Apr 13, 2010 at 1:00 PM, James Golick jamesgol...@gmail.com wrote: Hi All, I'm seeing about 35-50ms to read 1000 columns from a CF using get_range_slices. The columns are TimeUUIDType with empty values. The row cache is enabled and I'm running the query 500 times in a row, so I can only assume the row is cached. Is that about what's expected or am I doing something wrong? (It's from java this time, so it's not ruby thrift being slow). - James
Re: Lucandra or some way to query
On Wed, 2010-04-14 at 06:45 -0300, Jesus Ibanez wrote: Option 1 - insert data in all different ways I need in order to be able to query? Rolling your own indexes is fairly common with Cassandra. Option 2 - implement Lucandra? Can you link me to a blog or an article that guides me on how to implement Lucandra? I would recommend you explore this route a little further. I've never used Lucandra so I can't be of help, but the author is active. Have you tried submitting an issue on the github project page? Option 3 - switch to an SQL database? (I hope not). If your requirements can be met with an SQL database, then sure, why not? -- Eric Evans eev...@rackspace.com
Re: [RELEASE] 0.6.0
On Wed, 2010-04-14 at 10:16 -0500, Ted Zlatanov wrote: Can it support a non-root user through /etc/default/cassandra? I've been patching the init script myself but was hoping this would be standard. It's the first item on debian/TODO, but, you know, patches welcome and all that. -- Eric Evans eev...@rackspace.com
Re: Reading thousands of columns
On Wed, Apr 14, 2010 at 7:45 AM, Jonathan Ellis jbel...@gmail.com wrote: 35-50ms for how many rows of 1000 columns each? get_range_slices does not use the row cache, for the same reason that oracle doesn't cache tuples from sequential scans -- blowing away 1000s of rows worth of recently used rows queried by key, for a swath of rows from the scan, is the wrong call more often than it is the right one. Couldn't you cache a list of keys that were returned for the key range, then cache individual rows separately or not at all? By blowing away rows queried by key I'm guessing you mean pushing them out of the LRU cache, not explicitly blowing them away? Either way I'm not entirely convinced. In my experience I've had pretty good success caching items that were pulled out via more complicated join / range type queries. If your system is doing lots of range quereis, and not a lot of lookups by key, you'd obviously see a performance win from caching the range queries. Maybe range scan caching could be turned on separately? Mike
Re: History values
If you want to use Cassandra, you should probably store each historical value as a new column in the row. On Wed, Apr 14, 2010 at 12:34 AM, Yésica Rey yes...@gdtic.es wrote: I am new to using cassandra. In the documentation I have read, understand, that as in other non-documentary databases, to update the value of a key-value tuple, this new value is stored with a timestamp different but without entirely losing the old value. I wonder, as I can restore the historic values that have had a particular field. Greetings and thanks
Re: Reading thousands of columns
On Wed, Apr 14, 2010 at 10:31 AM, Mike Malone m...@simplegeo.com wrote: ... Couldn't you cache a list of keys that were returned for the key range, then cache individual rows separately or not at all? By blowing away rows queried by key I'm guessing you mean pushing them out of the LRU cache, not explicitly blowing them away? Either way I'm not entirely convinced. In my experience I've had pretty good success caching items that were pulled out via more complicated join / range type queries. If your system is doing lots of range quereis, and not a lot of lookups by key, you'd obviously see a performance win from caching the range queries. Maybe range scan caching could be turned on separately? I agree with you that the caches should be separate, if you're going to cache ranges. You could imagine a single query (perhaps entered interactively) would replace the entire row caching all of the data for the systems' interactive users. For example, a summary page of who is most over the last month active could replace the profile information for the actual users who are using the system at that moment. Paul Prescod
Re: Reading thousands of columns
The values are empty. It's 3000 UUIDs. On Wed, Apr 14, 2010 at 12:40 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote: How large are the values? How much data on disk? On Wednesday, April 14, 2010, James Golick jamesgol...@gmail.com wrote: Just for the record, I am able to repeat this locally. I'm seeing around 150ms to read 1000 columns from a row that has 3000 in it. If I enable the rowcache, that goes down to about 90ms. According to my profile, 90% of the time is being spent waiting for cassandra to respond, so it's not thrift. On Wed, Apr 14, 2010 at 11:01 AM, Paul Prescod pres...@gmail.com wrote: On Wed, Apr 14, 2010 at 10:31 AM, Mike Malone m...@simplegeo.com wrote: ... Couldn't you cache a list of keys that were returned for the key range, then cache individual rows separately or not at all? By blowing away rows queried by key I'm guessing you mean pushing them out of the LRU cache, not explicitly blowing them away? Either way I'm not entirely convinced. In my experience I've had pretty good success caching items that were pulled out via more complicated join / range type queries. If your system is doing lots of range quereis, and not a lot of lookups by key, you'd obviously see a performance win from caching the range queries. Maybe range scan caching could be turned on separately? I agree with you that the caches should be separate, if you're going to cache ranges. You could imagine a single query (perhaps entered interactively) would replace the entire row caching all of the data for the systems' interactive users. For example, a summary page of who is most over the last month active could replace the profile information for the actual users who are using the system at that moment. Paul Prescod
Re: Lucandra or some way to query
Hi, What doesn't work with lucandra exactly? Feel free to msg me. -Jake On Wed, Apr 14, 2010 at 9:30 PM, Jesus Ibanez jesusiba...@gmail.com wrote: I will explore Lucandra a little more and if I can't get it to work today, I will go for Option 2. Using SQL will not be efficient in the future, if my website grows. Thenks for your answer Eric! Jesús. 2010/4/14 Eric Evans eev...@rackspace.com On Wed, 2010-04-14 at 06:45 -0300, Jesus Ibanez wrote: Option 1 - insert data in all different ways I need in order to be able to query? Rolling your own indexes is fairly common with Cassandra. Option 2 - implement Lucandra? Can you link me to a blog or an article that guides me on how to implement Lucandra? I would recommend you explore this route a little further. I've never used Lucandra so I can't be of help, but the author is active. Have you tried submitting an issue on the github project page? Option 3 - switch to an SQL database? (I hope not). If your requirements can be met with an SQL database, then sure, why not? -- Eric Evans eev...@rackspace.com
Re: Lucandra or some way to query
If you worked with Lucandra in a dedicated searching-purposed cluster, you could balanced the data very well with some effort. I think Lucandra is really a great idea, but since it needs order-preserving-partitioner, does that mean there may be some 'hot-spot' during searching? -- View this message in context: http://n2.nabble.com/Lucandra-or-some-way-to-query-tp4900727p4905149.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Is that possible to write a file system over Cassandra?
Large files can be split into small blocks, and the size of block can be tuned. It may increase the complexity of writing such a file system, but can be for general purpose (not only for relative small files) On Thu, Apr 15, 2010 at 10:08 AM, Tatu Saloranta tsalora...@gmail.comwrote: On Wed, Apr 14, 2010 at 6:42 PM, Zhuguo Shi bluefl...@gmail.com wrote: Hi, Cassandra has a good distributed model: decentralized, auto-partition, auto-recovery. I am evaluating about writing a file system over Cassandra (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if Cassandra is good at such use case? It sort of depends on what you are looking for. From use case for which something like S3 is good, yes, except with one difference: Cassandra is more geared towards lots of small files, whereas S3 is more geared towards moderate number of files (possibly large). So I think it can definitely be a good use case, and I may use Cassandra for this myself in future. Having range queries allows implementing directory/path structures (list keys using path as prefix). And you can split storage such that metadata could live in OPP partition, raw data in RP. -+ Tatu +-
Re: Is that possible to write a file system over Cassandra?
On Wed, Apr 14, 2010 at 9:15 PM, Ken Sandney bluefl...@gmail.com wrote: Large files can be split into small blocks, and the size of block can be tuned. It may increase the complexity of writing such a file system, but can be for general purpose (not only for relative small files) Right, this is the path that MongoDB has taken with GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification I don't have any use for such a filesystem, but if I were to design one I would probably mostly follow Tatu's suggestions: On Thu, Apr 15, 2010 at 10:08 AM, Tatu Saloranta tsalora...@gmail.comwrote: So I think it can definitely be a good use case, and I may use Cassandra for this myself in future. Having range queries allows implementing directory/path structures (list keys using path as prefix). And you can split storage such that metadata could live in OPP partition, raw data in RP. but using OPP for all data, using prefixed metadata, and UUID_chunk# for keys in the chunk CF.
Re: Is that possible to write a file system over Cassandra?
Exactly. You can split a file into blocks of any size and you can actually distribute the metadata across a large set of machines. You wouldn't have the issue of having small files in this approach. The issue maybe the eventual consistency - not sure that is a paradigm that would be acceptable for a file system. But that is a discussion for another time/day. Avinash On Wed, Apr 14, 2010 at 7:15 PM, Ken Sandney bluefl...@gmail.com wrote: Large files can be split into small blocks, and the size of block can be tuned. It may increase the complexity of writing such a file system, but can be for general purpose (not only for relative small files) On Thu, Apr 15, 2010 at 10:08 AM, Tatu Saloranta tsalora...@gmail.comwrote: On Wed, Apr 14, 2010 at 6:42 PM, Zhuguo Shi bluefl...@gmail.com wrote: Hi, Cassandra has a good distributed model: decentralized, auto-partition, auto-recovery. I am evaluating about writing a file system over Cassandra (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if Cassandra is good at such use case? It sort of depends on what you are looking for. From use case for which something like S3 is good, yes, except with one difference: Cassandra is more geared towards lots of small files, whereas S3 is more geared towards moderate number of files (possibly large). So I think it can definitely be a good use case, and I may use Cassandra for this myself in future. Having range queries allows implementing directory/path structures (list keys using path as prefix). And you can split storage such that metadata could live in OPP partition, raw data in RP. -+ Tatu +-
Re: Is that possible to write a file system over Cassandra?
OPP is not required here. You would be better off using a Random partitioner because you want to get a random distribution of the metadata. Avinash On Wed, Apr 14, 2010 at 7:25 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote: Exactly. You can split a file into blocks of any size and you can actually distribute the metadata across a large set of machines. You wouldn't have the issue of having small files in this approach. The issue maybe the eventual consistency - not sure that is a paradigm that would be acceptable for a file system. But that is a discussion for another time/day. Avinash On Wed, Apr 14, 2010 at 7:15 PM, Ken Sandney bluefl...@gmail.com wrote: Large files can be split into small blocks, and the size of block can be tuned. It may increase the complexity of writing such a file system, but can be for general purpose (not only for relative small files) On Thu, Apr 15, 2010 at 10:08 AM, Tatu Saloranta tsalora...@gmail.comwrote: On Wed, Apr 14, 2010 at 6:42 PM, Zhuguo Shi bluefl...@gmail.com wrote: Hi, Cassandra has a good distributed model: decentralized, auto-partition, auto-recovery. I am evaluating about writing a file system over Cassandra (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if Cassandra is good at such use case? It sort of depends on what you are looking for. From use case for which something like S3 is good, yes, except with one difference: Cassandra is more geared towards lots of small files, whereas S3 is more geared towards moderate number of files (possibly large). So I think it can definitely be a good use case, and I may use Cassandra for this myself in future. Having range queries allows implementing directory/path structures (list keys using path as prefix). And you can split storage such that metadata could live in OPP partition, raw data in RP. -+ Tatu +-
Re: Is that possible to write a file system over Cassandra?
On Wed, Apr 14, 2010 at 9:26 PM, Avinash Lakshman avinash.laksh...@gmail.com wrote: OPP is not required here. You would be better off using a Random partitioner because you want to get a random distribution of the metadata. Not required, certainly. However, it strikes me that 1 cluster is better than 2, and most consumers of a filesystem would expect to be able to get an ordered listing or tree of the metadata which is easy using the OPP row key pattern listed previously. You could still do this with the Random partitioner using column names in rows to describe the structure but the current compaction limitations could be an issue if a branch becomes too large, and you'd still have a root row hotspot (at least in the schema which comes to mind).
Re: Is that possible to write a file system over Cassandra?
Note: there are glusterfs, ceph, brtfs and luster. there is drbd. -- View this message in context: http://n2.nabble.com/Is-that-possible-to-write-a-file-system-over-Cassandra-tp4905111p4905312.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Is that possible to write a file system over Cassandra?
On Wed, Apr 14, 2010 at 11:01 PM, Ken Sandney bluefl...@gmail.com wrote: a fuse based FS maybe better I guess This has been done, for better or worse, by jdarcy of http://pl.atyp.us/: http://github.com/jdarcy/CassFS
Re: Is that possible to write a file system over Cassandra?
tried CassFS, but not stable yet, may be a good prototype to start On Thu, Apr 15, 2010 at 12:15 PM, Michael Greene michael.gre...@gmail.comwrote: On Wed, Apr 14, 2010 at 11:01 PM, Ken Sandney bluefl...@gmail.com wrote: a fuse based FS maybe better I guess This has been done, for better or worse, by jdarcy of http://pl.atyp.us/: http://github.com/jdarcy/CassFS
Re: Starting Cassandra Fauna
Hi, I want to insert data into Cassandra programmatically in a loop. Also i'm a newbie to Linux world and Github. Started to work on Linux for only reason to implement Cassandra.Digging Cassandra for last on week.How to insert data in cassandra and test it? Can anyone help me out on this? - Nimala
Re: Starting Cassandra Fauna
try this https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP On Thu, Apr 15, 2010 at 12:23 PM, Nirmala Agadgar nirmala...@gmail.comwrote: Hi, I want to insert data into Cassandra programmatically in a loop. Also i'm a newbie to Linux world and Github. Started to work on Linux for only reason to implement Cassandra.Digging Cassandra for last on week.How to insert data in cassandra and test it? Can anyone help me out on this? - Nimala
Re: Starting Cassandra Fauna
There is a tutorial here: * http://www.sodeso.nl/?p=80 This page includes data inserts: * http://www.sodeso.nl/?p=251 Like: c.setColumn(new Column(email.getBytes(utf-8), ronald (at) sodeso.nl.getBytes(utf-8), timestamp)) columns.add(c); The Sample code is attached to that blog post. On Wed, Apr 14, 2010 at 9:23 PM, Nirmala Agadgar nirmala...@gmail.com wrote: Hi, I want to insert data into Cassandra programmatically in a loop. Also i'm a newbie to Linux world and Github. Started to work on Linux for only reason to implement Cassandra.Digging Cassandra for last on week.How to insert data in cassandra and test it? Can anyone help me out on this? - Nimala
Re: Starting Cassandra Fauna
Hi, I'm using ruby client as of now. Can u give details for ruby client.Also if possible java client. Thanks for reply. - Nirmala On Thu, Apr 15, 2010 at 10:02 AM, richard yao richard.yao2...@gmail.comwrote: try this https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP On Thu, Apr 15, 2010 at 12:23 PM, Nirmala Agadgar nirmala...@gmail.comwrote: Hi, I want to insert data into Cassandra programmatically in a loop. Also i'm a newbie to Linux world and Github. Started to work on Linux for only reason to implement Cassandra.Digging Cassandra for last on week.How to insert data in cassandra and test it? Can anyone help me out on this? - Nimala
TException: Error: TSocket: timed out reading 1024 bytes from 10.1.1.27:9160
I am having a try on cassandra, and I use php to access cassandra by thrift API. I got an error like this: TException: Error: TSocket: timed out reading 1024 bytes from 10.1.1.27:9160 What's wrong? Thanks.
Re: Lucandra or some way to query
Lucandra spreads the data randomly by index + field combination so you do get some distribution for free. Otherwise you can use nodetool loadbalance to alter the token ring to alleviate hotspots. On Thu, Apr 15, 2010 at 2:04 AM, HubertChang hui...@gmail.com wrote: If you worked with Lucandra in a dedicated searching-purposed cluster, you could balanced the data very well with some effort. I think Lucandra is really a great idea, but since it needs order-preserving-partitioner, does that mean there may be some 'hot-spot' during searching? -- View this message in context: http://n2.nabble.com/Lucandra-or-some-way-to-query-tp4900727p4905149.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.