Re: Disable FS journaling

2014-05-20 Thread Terje Marthinussen
Journal enabled is faster on almost all operations. Recovery here is more about saving you from waiting 1/2 hour from a traditional full file system check. Feel free to wait if you want though! :) Regards, Terje On 21 May 2014, at 01:11, Paulo Ricardo Motta Gomes

Re: Cassandra as storage for cache data

2013-07-02 Thread Terje Marthinussen
If this is a tombstone problem as suggested by some, and it is ok to turn of replication as suggested by others, it may be an idea to do an optimization in cassandra where if replication_factor 1: do not create tombstones Terje On Jul 2, 2013, at 11:11 PM, Dmitry Olshansky

Re: Throughput decreases as latency increases with YCSB

2012-10-30 Thread Terje Marthinussen
Check how many concurrent real requests you have vs size of thread pools. Regards, Terje On 30 Oct 2012, at 13:28, Peter Bailis pbai...@cs.berkeley.edu wrote: I'm using YCSB on EC2 with one m1.large instance to drive client load To add, I don't believe this is due to YCSB. I've done a fair

Re: quick question about data layout on disk

2012-08-10 Thread Terje Marthinussen
Rowkey is stored only once in any sstable file. That is, in the spesial case where you get sstable file per column/value, you are correct, but normally, I guess most of us are storing more per key. Regards, Terje On 11 Aug 2012, at 10:34, Aaron Turner synfina...@gmail.com wrote: Curious, but

Re: Use of SSD for commitlog

2012-08-08 Thread Terje Marthinussen
Probably you can get an intel 320 160GB or a Samsung 830 for the same price as the 146GB 15k rpm drive. Overprovision the SSD 20% and off you go. It will beat the HDD both sequentially and randomly. Terje On Aug 8, 2012, at 11:41 PM, Amit Kumar kumaramit.ex...@gmail.com wrote: There is a

Re: Much more native memory used by Cassandra then the configured JVM heap size

2012-06-21 Thread Terje Marthinussen
We run some fairly large and busy Cassandra setups. All of them without mmap. I have yet to see a benchmark which conclusively can say mmap is better (or worse for that matter) than standard ways of doing I/O and we have done many of them last 2 years by different people, with different tools

Re: two dimensional slicing

2012-01-29 Thread Terje Marthinussen
On Sun, Jan 29, 2012 at 7:26 PM, aaron morton aa...@thelastpickle.comwrote: and compare them, but at this point I need to focus on one to get things working, so I'm trying to make a best initial guess. I would go for RP then, BOP may look like less work to start with but it *will* bite you

Re: What is the future of supercolumns ?

2012-01-06 Thread Terje Marthinussen
Please realize that I do not make any decisions here and I am not part of the core Cassandra developer team. What has been said before is that they will most likely go away and at least under the hood be replaced by composite columns. Jonathan have however stated that he would like the

Re: [RELEASE] Apache Cassandra 1.0.6 released

2011-12-16 Thread Terje Marthinussen
Works if you turn off mmap? We run without mmap and see hardly any difference in performance, but with huge benefits in the form of a memory consumption which can actually be monitored easily and it just seem like things are more stable this way in general. Just turn off and see how that

Re: Hinted handoff bug?

2011-12-01 Thread Terje Marthinussen
Sorry for not checking source to see if things have changed but i just remembered an issue I have forgotten to make jira for. In old days, nodes would periodically try to deliver queues. However, this was at some stage changed so it only deliver if a node is being marked up. However, you can

Re: hw requirements

2011-08-31 Thread Terje Marthinussen
SSD's definitely makes live simpler as you will get a lot less trouble with impact from things like compactions. Just beware that Cassandra expands data a lot due to storage overhead (for small columns), replication and needed space for compactions and repairs. It is well worth doing some

Re: Using 5-6 bytes for cassandra timestamps vs 8…

2011-08-29 Thread Terje Marthinussen
I have a patch for trunk which I just have to get time to test a bit before I submit. It is for super columns and will use the super columns timestamp as the base and only store variant encoded offsets in the underlying columns. If the timestamp equals that of the SC, it will store nothing

Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Terje Marthinussen
On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote: This is not advisable in general, since non-mmap'd I/O is substantially slower. I see this again and again as a claim here, but it is actually close to 10 years since I saw mmap'd I/O have any substantial performance benefits on any real

Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Terje Marthinussen
AM, Terje Marthinussen tmarthinus...@gmail.com wrote: On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote: This is not advisable in general, since non-mmap'd I/O is substantially slower. I see this again and again as a claim here, but it is actually close to 10 years since I saw

Re: Repair doesn't work after upgrading to 0.8.1

2011-06-30 Thread Terje Marthinussen
Unless it is a 0.8.1 RC or beta On Fri, Jul 1, 2011 at 12:57 PM, Jonathan Ellis jbel...@gmail.com wrote: This isn't 2818 -- (a) the 0.8.1 protocol is identical to 0.8.0 and (b) the whole cluster is on the same version. On Thu, Jun 30, 2011 at 9:35 PM, aaron morton aa...@thelastpickle.com

Re: RAID or no RAID

2011-06-27 Thread Terje Marthinussen
If you have a quality HW raid controller with proper performance (and far from all have good performance) you cam definitely benefit from a battery backed up write cache on it, although the benefits will not be huge on raid 0. Unless you get a really good price on that high performance HW raid

Re: Cassandra ACID

2011-06-26 Thread Terje Marthinussen
That being said, we do not provide isolation, which means in particular that reads *can* return a state where only parts of a batch update seems applied (and it would clearly be cool to have isolation and I'm not even saying this will never happen). Out of curiosity, do you see any

snitch thrift

2011-06-16 Thread Terje Marthinussen
Hi all! Assuming a node ends up in GC land for a while, there is a good chance that even though it performs terribly and the dynamic snitching will help you to avoid it on the gossip side, it will not really help you much if thrift still accepts requests and the thrift interface has choppy

Re: Forcing Cassandra to free up some space

2011-06-15 Thread Terje Marthinussen
Even if the gc call cleaned all files, it is not really acceptable on a decent sized cluster due to the impact full gc has on performance. Especially non-needed ones. The delay in file deletion can also at times make it hard to see how much spare disk you actually have. We easily see 100%

Re: Forcing Cassandra to free up some space

2011-06-15 Thread Terje Marthinussen
On Thu, Jun 16, 2011 at 12:48 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: Even if the gc call cleaned all files, it is not really acceptable on a decent sized cluster due to the impact full gc has on performance. Especially non-needed ones. Not acceptable as running GC on every

What triggers hint delivery?

2011-06-15 Thread Terje Marthinussen
Hi, I was looking quickly at source code tonight. As far as I could see from a quick code scan, hint delivery is only triggered as a state change from a node is down to when it enters up state? If this is indeed the case, it would potentially explain why we sometimes have hints on machines which

Re: What triggers hint delivery?

2011-06-15 Thread Terje Marthinussen
on heartbeats maybe (potentially not all of them, but at a regular interval)? Terje On Thu, Jun 16, 2011 at 2:08 AM, Jonathan Ellis jbel...@gmail.com wrote: On Wed, Jun 15, 2011 at 10:53 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: I was looking quickly at source code tonight. As far

Re: downgrading from cassandra 0.8 to 0.7.3

2011-06-15 Thread Terje Marthinussen
Can't help you with that. You may have to go the json2sstable route and re-import into 0.7.3 But... why would you want to go back to 0.7.3? Terje On Thu, Jun 16, 2011 at 10:30 AM, Anurag Gujral anurag.guj...@gmail.comwrote: Hi All, I moved to cassandra 0.8.0 from cassandra-0.7.3

Re: Forcing Cassandra to free up some space

2011-06-15 Thread Terje Marthinussen
Watching this on a node here right now and it sort of shows how bad this can get. This node still has 109GB free disk by the way... INFO [CompactionExecutor:5] 2011-06-16 09:11:59,164 StorageService.java (line 2071) requesting GC to free disk space INFO [CompactionExecutor:5] 2011-06-16

repair and amount of transfers

2011-06-14 Thread Terje Marthinussen
Hi, I have been testing repairs a bit in different ways on 0.8.0 and I am curious on what to really expect in terms of data transferred. I would expect my data to be fairly consistent in this case from the start. More than a billion supercolumns, but there was no errors in feed and we have seen

Re: repair and amount of transfers

2011-06-14 Thread Terje Marthinussen
Ah.. I just found Cassandra-2698 (I thought I had seen something about this)... I guess that means I have too see if I can find time to investigate if I have a reproducible case? Terje On Tue, Jun 14, 2011 at 4:21 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Hi, I have been

Re: insufficient space to compact even the two smallest files, aborting

2011-06-13 Thread Terje Marthinussen
That most likely happened just because after scrub you had new files and got over the 4 file minimum limit. https://issues.apache.org/jira/browse/CASSANDRA-2697 Is the bug report. 2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com Hi All. I found a way to be able to compact. I have to

Re: insufficient space to compact even the two smallest files, aborting

2011-06-10 Thread Terje Marthinussen
bug in the 0.8.0 release version. Cassandra splits the sstables depending on size and tries to find (by default) at least 4 files of similar size. If it cannot find 4 files of similar size, it logs that message in 0.8.0. You can try to reduce the minimum required files for compaction and it

Re: insufficient space to compact even the two smallest files, aborting

2011-06-10 Thread Terje Marthinussen
12 sounds perfectly fine in this case. 4 buckets, 3 in each bucket, the minimum default threshold _per is 4. Terje 2011/6/10 Héctor Izquierdo Seliva izquie...@strands.com El vie, 10-06-2011 a las 20:21 +0900, Terje Marthinussen escribió: bug in the 0.8.0 release version. Cassandra

Re: insufficient space to compact even the two smallest files, aborting

2011-06-10 Thread Terje Marthinussen
will affect on minor compaction frequency, won't it? maki 2011/6/10 Terje Marthinussen tmarthinus...@gmail.com: bug in the 0.8.0 release version. Cassandra splits the sstables depending on size and tries to find (by default) at least 4 files of similar size. If it cannot find 4 files of similar

Re: Troubleshooting IO performance ?

2011-06-07 Thread Terje Marthinussen
If you run iostat without output every few second, is the I/O stable or do you see very uneven I/O? Regards, Terje On Tue, Jun 7, 2011 at 11:12 AM, aaron morton aa...@thelastpickle.comwrote: There is a big IO queue and reads are spending a lot of time in the queue. Some more questions: -

Re: [RELEASE] 0.8.0

2011-06-06 Thread Terje Marthinussen
(or otherwise had the per-CF memtable settings applied?) On Mon, Jun 6, 2011 at 12:00 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: 0.8 under load may turn out to be more stable and well behaving than any release so far Been doing a few test runs stuffing more than 1 billion records

Re: [RELEASE] 0.8.0

2011-06-06 Thread Terje Marthinussen
has 0 subcolumns in the first place? Is that expected behaviour? Regards, Terje On Mon, Jun 6, 2011 at 10:09 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Of course I talked too soon. I saw a corrupted commitlog some days back after killing cassandra and I just came across

Re: [RELEASE] 0.8.0

2011-06-06 Thread Terje Marthinussen
Yes, I am aware of it but it was not an alternative for this project which will face production soon. The patch I have is fairly non-intrusive (especially vs. 674) so I think it can be interesting depending on how quickly 674 will be integrated into cassandra releases. I plan to take a closer

Re: [RELEASE] 0.8.0

2011-06-05 Thread Terje Marthinussen
0.8 under load may turn out to be more stable and well behaving than any release so far Been doing a few test runs stuffing more than 1 billion records into a 12 node cluster and thing looks better than ever. VM's stable and nice at 11GB. No data corruptions, dead nodes, full GC's or any of the

Re: Memory Usage During Read

2011-05-14 Thread Terje Marthinussen
Out of curiosity, could you try to disable mmap as well? I had some problems here some time back and I wanted to see better what was going on and disabled the mmap. I actually don't think I have the same problem again, but I have seen javavm sizes up in 30-40MB with a heap of just 16. Haven't

Re: Excessive allocation during hinted handoff

2011-05-12 Thread Terje Marthinussen
Just out of curiosity is this on the receiver or sender side? I have been wondering a bit if the hint playback could need some adjustment. There is potentially quite big differences on how much is sent per throttle delay time depending on what your data looks like. Early 0.7 releases also built

Re: Excessive allocation during hinted handoff

2011-05-12 Thread Terje Marthinussen
An if you have 10 nodes, do all of them happen to send hints to the two with GC? Terje On Thu, May 12, 2011 at 6:10 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Just out of curiosity is this on the receiver or sender side? I have been wondering a bit if the hint playback could need

Re: compaction strategy

2011-05-11 Thread Terje Marthinussen
Not sure I follow you. 4 sstables is the minimum compaction look for (by default). If there is 30 sstables of ~20MB sitting there because compaction is behind, you will compact those 30 sstables together (unless there is not enough space for that and considering you haven't changed the

Re: column bloat

2011-05-11 Thread Terje Marthinussen
On Wed, May 11, 2011 at 8:06 AM, aaron morton aa...@thelastpickle.comwrote: For a reasonable large amount of use cases (for me, 2 out of 3 at the moment) supercolumns will be units of data where the columns (attributes) will never change by themselves or where the data does not change anyway

column bloat

2011-05-10 Thread Terje Marthinussen
Hi, If you make a supercolumn today, what you end up with is: - short + Super Column name - int (local deletion time) - long (delete time) Byte array of columns each with: - short + column name - int (TTL) - int (local deletion time) - long (timestamp) - int + value of column That

Re: column bloat

2011-05-10 Thread Terje Marthinussen
Anyway, to sum that up, expiring columns are 1 byte more and non-expiring ones are 7 bytes less. Not arguing, it's still fairly verbose, especially with tons of very small columns. Yes, you are right, sorry. Trying to do one thing to many at the same time. My brain filtered out part of the

Re: compaction strategy

2011-05-10 Thread Terje Marthinussen
Everyone may be well aware of that, but I'll still remark that a minor compaction will try to merge as many 20MB sstables as it can up to the max compaction threshold (which is configurable). So if you do accumulate some newly created sstable at some point in time, the next minor compaction

Re: compaction strategy

2011-05-09 Thread Terje Marthinussen
9, 2011 at 12:46 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Yes, agreed. I actually think cassandra has to. And if you do not go down to that single file, how do you avoid getting into a situation where you can very realistically end up with 4-5 big sstables each having its own

compaction strategy

2011-05-07 Thread Terje Marthinussen
Even with the current concurrent compactions, given a high speed datafeed, compactions will obviously start lagging at some stage, and once it does, things can turn bad in terms of disk usage and read performance. I have not read the compaction code well, but if

Re: compaction strategy

2011-05-07 Thread Terje Marthinussen
job. Terje On Sat, May 7, 2011 at 9:54 PM, Jonathan Ellis jbel...@gmail.com wrote: On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: 1. Would it make sense to make full compactions occur a bit more aggressive. I'd rather reduce the performance impact

Re: MemtablePostFlusher with high number of pending calls?

2011-05-04 Thread Terje Marthinussen
. Terje On Wed, May 4, 2011 at 6:34 AM, Terje Marthinussen tmarthinus...@gmail.comwrote: Hm... peculiar. Post flush is not involved in compactions, right? May 2nd 01:06 - Out of disk 01:51 - Starts a mix of major and minor compactions on different column families It then starts a few minor

Re: MemtablePostFlusher with high number of pending calls?

2011-05-04 Thread Terje Marthinussen
the disk seems to have been full for 35 minutes due to un-deleted sstables. Terje On Wed, May 4, 2011 at 6:34 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: Hm... peculiar. Post flush is not involved in compactions, right? May 2nd 01:06 - Out of disk 01:51 - Starts a mix of major

Re: MemtablePostFlusher with high number of pending calls?

2011-05-04 Thread Terje Marthinussen
completely run out of disk space Regards, Terje On Wed, May 4, 2011 at 10:09 PM, Jonathan Ellis jbel...@gmail.com wrote: Or we could reserve space when starting a compaction. On Wed, May 4, 2011 at 2:32 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: Partially, I guess this may

MemtablePostFlusher with high number of pending calls?

2011-05-03 Thread Terje Marthinussen
Cassandra 0.8 beta trunk from about 1 week ago: Pool NameActive Pending Completed ReadStage 0 0 5 RequestResponseStage 0 0 87129 MutationStage 0 0 187298

Re: MemtablePostFlusher with high number of pending calls?

2011-05-03 Thread Terje Marthinussen
: ... and are there any exceptions in the log? On Tue, May 3, 2011 at 1:01 PM, Jonathan Ellis jbel...@gmail.com wrote: Does it resolve down to 0 eventually if you stop doing writes? On Tue, May 3, 2011 at 12:56 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Cassandra 0.8 beta trunk from about 1 week

Re: MemtablePostFlusher with high number of pending calls?

2011-05-03 Thread Terje Marthinussen
So yes, there is currently some 200GB empty disk. On Wed, May 4, 2011 at 3:20 AM, Terje Marthinussen tmarthinus...@gmail.comwrote: Just some very tiny amount of writes in the background here (some hints spooled up on another node slowly coming in). No new data. I thought

Re: MemtablePostFlusher with high number of pending calls?

2011-05-03 Thread Terje Marthinussen
catastrophically fail, its corresponding post-flush task will be stuck. On Tue, May 3, 2011 at 1:20 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Just some very tiny amount of writes in the background here (some hints spooled up on another node slowly coming in). No new data. I thought

Re: MemtablePostFlusher with high number of pending calls?

2011-05-03 Thread Terje Marthinussen
debug logging and see if I get lucky and run out of disk again. Terje On Wed, May 4, 2011 at 5:06 AM, Jonathan Ellis jbel...@gmail.com wrote: Compaction does, but flush didn't until https://issues.apache.org/jira/browse/CASSANDRA-2404 On Tue, May 3, 2011 at 2:26 PM, Terje Marthinussen

Re: memtablePostFlusher blocking writes?

2011-04-27 Thread Terje Marthinussen
compaction activity as well. (Also, if each of those pending mutations is 10,000 columns, you may be causing yourself memory pressure as well.) On Wed, Apr 27, 2011 at 11:01 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: 0.8 trunk: When playing back a fairly large chunk of hints, things

multithreaded compaction

2011-04-26 Thread Terje Marthinussen
Hi, I was testing the multithreaded compactions and with 2x6 cores (24 with HT) it does seem a bit crazy with 24 compactions running concurrently. It is probably not very good in terms of random I/O. As such, I think I agree with the argument in 2191 that there should be a config option for

Re: multithreaded compaction

2011-04-26 Thread Terje Marthinussen
PM, Sylvain Lebresne sylv...@datastax.comwrote: On Tue, Apr 26, 2011 at 9:01 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: Hi, I was testing the multithreaded compactions and with 2x6 cores (24 with HT) it does seem a bit crazy with 24 compactions running concurrently

Re: 0.7.4 Bad sstables?

2011-04-25 Thread Terje Marthinussen
I have been hunting similar looking corruptions, especially in the hints column family, but I believe it occurs somewhere while compacting. I looked in greater detail on one sstable and the row length was longer than the actual data in the row, and as far as I could see, either the length was

Re: 0.8 loosing nodes?

2011-04-25 Thread Terje Marthinussen
heartbeats again and other nodes log that they receive the heartbeats, but this will not get it marked as UP again until restarted. So, seems like 2 issues: - Nodes pausing (may be just node overload) - Nodes are not marked as UP unless restarted Regards, Terje On 24 Apr 2011, at 23:24, Terje

multithreaded compaction causes mutation storms?

2011-04-24 Thread Terje Marthinussen
Tested out multithreaded compaction in 0.8 last night. We had first fed some data with compaction disabled so there was 1000+ sstables on the nodes and I decided to enable multithreaded compaction on one of them to see how it performed vs. nodes that had no compaction at all. Since this was sort

0.8 loosing nodes?

2011-04-24 Thread Terje Marthinussen
World as seen from .81 in the below ring .81 Up Normal 85.55 GB8.33% Token(bytes[30]) .82 Down Normal 83.23 GB8.33% Token(bytes[313230]) .83 Up Normal 70.43 GB8.33% Token(bytes[313437]) .84 Up Normal 81.7 GB 8.33%

Re: Compacting single file forever

2011-04-22 Thread Terje Marthinussen
I think the really interesting part is how this node ended up in this state in the first place. There should be somewhere in the area of 340-500GB of data on it in when everything is 100% compacted. Problem now is that it used (we wiped it last night to test some 0.8 stuff) more then 1TB. To me,

Re: Multi-DC Deployment

2011-04-20 Thread Terje Marthinussen
a result, then you use read one, if you want to get a highly available better quality result use local quorum. That is a per-query option. Adrian On Tue, Apr 19, 2011 at 6:46 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: If you have RF=3 in both datacenters, it could be discussed

Re: Multi-DC Deployment

2011-04-19 Thread Terje Marthinussen
Hum... Seems like it could be an idea in a case like this with a mode where result is always returned (if possible), but where a flay saying if the consistency level was met, or to what level it was met (number of nodes answering for instance).? Terje On Tue, Apr 19, 2011 at 1:13 AM, Jonathan

Re: Multi-DC Deployment

2011-04-19 Thread Terje Marthinussen
at 11:16 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Hum... Seems like it could be an idea in a case like this with a mode where result is always returned (if possible), but where a flay saying if the consistency level was met, or to what level it was met (number of nodes

Re: raid 0 and ssd

2011-04-14 Thread Terje Marthinussen
Hm... You should notice that unless you have TRIM, which I don't think any OS support with any raid functionality yet, then once you have written once to the whole SSD, it is always full! That is, when you delete a file, you don't clear the blocks on the SSD so as far as the SSD goes, the data

Re: Timeout during stress test

2011-04-11 Thread Terje Marthinussen
I notice you have pending hinted handoffs? Look for errors related to that. We have seen occasional corruptions in the hinted handoff sstables, If you are stressing the system to its limits, you may also consider playing with more with the number of read/write threads (concurrent_reads/writes)

Re: How to repair HintsColumnFamily?

2011-04-01 Thread Terje Marthinussen
Seeing similar errors on another system (0.7.4). Maybe something bogus with the hint columnfamilies. Terje On Mon, Mar 28, 2011 at 7:15 PM, Shotaro Kamio kamios...@gmail.com wrote: I see. Then, I'll remove the HintsColumnFamily. Because our cluster has a lot of data, running repair takes

Re: balance between concurrent_[reads|writes] and feeding/reading threads i clients

2011-04-01 Thread Terje Marthinussen
it unless you see the thread pools backing up and messages being dropped. Hope that helps Aaron On 28 Mar 2011, at 19:55, Terje Marthinussen wrote: Hi, I was pondering about how the concurrent_read and write settings balances towards max read/write threads in clients. Lets say we have 3

secondary indexes on data imported by json2sstable

2011-03-14 Thread Terje Marthinussen
Hi, Should it be expected that secondary indexes are automatically regenerated when importing data using json2sstable? Or is there some manual procedure that needs to be done to generate them? Regards, Terje

Re: 0.7.3 nodetool scrub exceptions

2011-03-08 Thread Terje Marthinussen
I had similar errors in late 0.7.3 releases related to testing I did for the mails with subject Argh: Data Corruption (LOST DATA) (0.7.0). I do not see these corruptions or the above error anymore with 0.7.3 release as long as the dataset is created from scratch. The patch (2104) mentioned in the

Re: Argh: Data Corruption (LOST DATA) (0.7.0)

2011-03-05 Thread Terje Marthinussen
, Benjamin Coverston ben.covers...@datastax.com wrote: Hi Terje, Can you attach the portion of your logs that shows the exceptions indicating corruption? Which version are you on right now? Ben On 3/4/11 10:42 AM, Terje Marthinussen wrote: We are seeing various other messages as well

Re: Argh: Data Corruption (LOST DATA) (0.7.0)

2011-03-04 Thread Terje Marthinussen
We are seeing various other messages as well related to deserialization, so this seems to be some random corruption somewhere, but so far it may seem to be limited to supercolumns. Terje On Sat, Mar 5, 2011 at 2:26 AM, Terje Marthinussen tmarthinus...@gmail.comwrote: Hi, Did you get anywhere

Re: Fill disks more than 50%

2011-02-25 Thread Terje Marthinussen
I am suggesting that your probably want to rethink your scheme design since partitioning by year is going to be bad performance since the old servers are going to be nothing more then expensive tape drives. You fail to see the obvious It is just the fact that most of the data is stale

Re: Fill disks more than 50%

2011-02-25 Thread Terje Marthinussen
@Thibaut Britz Caveat:Using simple strategy. This works because cassandra scans data at startup and then serves what it finds. For a join for example you can rsync all the data from the node below/to the right of where the new node is joining. Then join without bootstrap then cleanup both

Re: 2x storage

2011-02-25 Thread Terje Marthinussen
Cassandra never compacts more than one column family at the time? Regards, Terje On 26 Feb 2011, at 02:40, Robert Coli rc...@digg.com wrote: On Fri, Feb 25, 2011 at 9:22 AM, A J s5a...@gmail.com wrote: I read in some cassandra notes that each node should be allocated twice the storage

Fill disks more than 50%

2011-02-23 Thread Terje Marthinussen
Hi, Given that you have have always increasing key values (timestamps) and never delete and hardly ever overwrite data. If you want to minimize work on rebalancing and statically assign (new) token ranges to new nodes as you add them so they always get the latest data Lets say you add a new

Re: Compression in Cassandra

2011-01-20 Thread Terje Marthinussen
Perfectly normal with 3-7x increase in data size depending on you data schema. Regards, Terje On 20 Jan 2011, at 23:17, akshatbakli...@gmail.com akshatbakli...@gmail.com wrote: I just did a du -h DataDump which showed 40G and du -h CassandraDataDump which showed 170G am i doing something

Re: Cassandra memtable and GC

2010-11-22 Thread Terje Marthinussen
Look at the graph again. Especially from the first posting. The records/second read (by the client) goes down as disk reads goes down. Looks like something is fishy with the memtables. Terje On Tue, Nov 23, 2010 at 1:54 AM, Edward Capriolo edlinuxg...@gmail.comwrote: On Mon, Nov 22, 2010 at

Re: SSD vs. HDD

2010-11-03 Thread Terje Marthinussen
How high is high and how much data do you have (Cassandra disk usage). Regards, Terje On 4 Nov 2010, at 04:32, Alaa Zubaidi alaa.zuba...@pdf.com wrote: Hi, we have a continuous high throughput writes, read and delete, and we are trying to find the best hardware. Is using SSD for Cassandra

Re: about insert benchmark

2010-09-02 Thread Terje Marthinussen
1000 and 1 records take too short time to really benchmark anything. You will use 2 seconds just for stuff like tcp_windows sizes to adjust to the level were you get throughput. The difference between 100k and 500k is less than 10%. Could be anything. Filesystem caches, sizes of memtables

Re: column family names

2010-08-31 Thread Terje Marthinussen
, Terje Marthinussen tmarthinus...@gmail.com wrote: Another option would of course be to store a mapping between dir/filenames and Keyspace/columns familes together with other info related to keyspaces and column families. Just add API/command line tools to look up the filenames and maybe

Re: column family names

2010-08-30 Thread Terje Marthinussen
class, which includes the underscore character. Aaron On 30 Aug 2010, at 21:01, Terje Marthinussen tmarthinus...@gmail.com wrote: Hi, Now that we can make columns families on the fly, it gets interesting to use column families more as part of the data model (can reduce diskspace quite

Re: cassandra disk usage

2010-08-30 Thread Terje Marthinussen
On Mon, Aug 30, 2010 at 10:10 PM, Jonathan Ellis jbel...@gmail.com wrote: column names are stored per cell (moving to user@) I think that is already accommodated for in my numbers? What i listed was measured from the actual SSTable file (using the output from strings sstable.db), so

Re: column family names

2010-08-30 Thread Terje Marthinussen
Beyond aesthetics, specific reasons? Terje On Tue, Aug 31, 2010 at 11:54 AM, Benjamin Black b...@b3k.us wrote: URL encoding.

Re: Digg 4 Preview on TWiT

2010-07-09 Thread Terje Marthinussen
http://twitter.com/nk/status/17903187277 Another not using joke?