Journal enabled is faster on almost all operations.
Recovery here is more about saving you from waiting 1/2 hour from a traditional
full file system check.
Feel free to wait if you want though! :)
Regards,
Terje
On 21 May 2014, at 01:11, Paulo Ricardo Motta Gomes
If this is a tombstone problem as suggested by some, and it is ok to turn of
replication as suggested by others, it may be an idea to do an optimization in
cassandra where
if replication_factor 1:
do not create tombstones
Terje
On Jul 2, 2013, at 11:11 PM, Dmitry Olshansky
Check how many concurrent real requests you have vs size of thread pools.
Regards,
Terje
On 30 Oct 2012, at 13:28, Peter Bailis pbai...@cs.berkeley.edu wrote:
I'm using YCSB on EC2 with one m1.large instance to drive client load
To add, I don't believe this is due to YCSB. I've done a fair
Rowkey is stored only once in any sstable file.
That is, in the spesial case where you get sstable file per column/value, you
are correct, but normally, I guess most of us are storing more per key.
Regards,
Terje
On 11 Aug 2012, at 10:34, Aaron Turner synfina...@gmail.com wrote:
Curious, but
Probably you can get an intel 320 160GB or a Samsung 830 for the same price as
the 146GB 15k rpm drive.
Overprovision the SSD 20% and off you go.
It will beat the HDD both sequentially and randomly.
Terje
On Aug 8, 2012, at 11:41 PM, Amit Kumar kumaramit.ex...@gmail.com wrote:
There is a
We run some fairly large and busy Cassandra setups.
All of them without mmap.
I have yet to see a benchmark which conclusively can say mmap is better (or
worse for that matter) than standard ways of doing I/O and we have done many of
them last 2 years by different people, with different tools
On Sun, Jan 29, 2012 at 7:26 PM, aaron morton aa...@thelastpickle.comwrote:
and compare them, but at this point I need to focus on one to get
things working, so I'm trying to make a best initial guess.
I would go for RP then, BOP may look like less work to start with but it
*will* bite you
Please realize that I do not make any decisions here and I am not part of the
core Cassandra developer team.
What has been said before is that they will most likely go away and at least
under the hood be replaced by composite columns.
Jonathan have however stated that he would like the
Works if you turn off mmap?
We run without mmap and see hardly any difference in performance, but with huge
benefits in the form of a memory consumption which can actually be monitored
easily and it just seem like things are more stable this way in general.
Just turn off and see how that
Sorry for not checking source to see if things have changed but i just
remembered an issue I have forgotten to make jira for.
In old days, nodes would periodically try to deliver queues.
However, this was at some stage changed so it only deliver if a node is being
marked up.
However, you can
SSD's definitely makes live simpler as you will get a lot less trouble with
impact from things like compactions.
Just beware that Cassandra expands data a lot due to storage overhead (for
small columns), replication and needed space for compactions and repairs.
It is well worth doing some
I have a patch for trunk which I just have to get time to test a bit before I
submit.
It is for super columns and will use the super columns timestamp as the base
and only store variant encoded offsets in the underlying columns.
If the timestamp equals that of the SC, it will store nothing
On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote:
This is not advisable in general, since non-mmap'd I/O is substantially
slower.
I see this again and again as a claim here, but it is actually close to 10
years since I saw mmap'd I/O have any substantial performance benefits on any
real
AM, Terje Marthinussen tmarthinus...@gmail.com wrote:
On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote:
This is not advisable in general, since non-mmap'd I/O is substantially
slower.
I see this again and again as a claim here, but it is actually close to 10
years since I saw
Unless it is a 0.8.1 RC or beta
On Fri, Jul 1, 2011 at 12:57 PM, Jonathan Ellis jbel...@gmail.com wrote:
This isn't 2818 -- (a) the 0.8.1 protocol is identical to 0.8.0 and
(b) the whole cluster is on the same version.
On Thu, Jun 30, 2011 at 9:35 PM, aaron morton aa...@thelastpickle.com
If you have a quality HW raid controller with proper performance (and far from
all have good performance) you cam definitely benefit from a battery backed up
write cache on it, although the benefits will not be huge on raid 0.
Unless you get a really good price on that high performance HW raid
That being said, we do not provide isolation, which means in particular
that
reads *can* return a state where only parts of a batch update seems applied
(and it would clearly be cool to have isolation and I'm not even
saying this will
never happen).
Out of curiosity, do you see any
Hi all!
Assuming a node ends up in GC land for a while, there is a good chance that
even though it performs terribly and the dynamic snitching will help you to
avoid it on the gossip side, it will not really help you much if thrift
still accepts requests and the thrift interface has choppy
Even if the gc call cleaned all files, it is not really acceptable on a
decent sized cluster due to the impact full gc has on performance.
Especially non-needed ones.
The delay in file deletion can also at times make it hard to see how much
spare disk you actually have.
We easily see 100%
On Thu, Jun 16, 2011 at 12:48 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
Even if the gc call cleaned all files, it is not really acceptable on a
decent sized cluster due to the impact full gc has on performance.
Especially non-needed ones.
Not acceptable as running GC on every
Hi,
I was looking quickly at source code tonight.
As far as I could see from a quick code scan, hint delivery is only
triggered as a state change from a node is down to when it enters up state?
If this is indeed the case, it would potentially explain why we sometimes
have hints on machines which
on heartbeats maybe
(potentially not all of them, but at a regular interval)?
Terje
On Thu, Jun 16, 2011 at 2:08 AM, Jonathan Ellis jbel...@gmail.com wrote:
On Wed, Jun 15, 2011 at 10:53 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
I was looking quickly at source code tonight.
As far
Can't help you with that.
You may have to go the json2sstable route and re-import into 0.7.3
But... why would you want to go back to 0.7.3?
Terje
On Thu, Jun 16, 2011 at 10:30 AM, Anurag Gujral anurag.guj...@gmail.comwrote:
Hi All,
I moved to cassandra 0.8.0 from cassandra-0.7.3
Watching this on a node here right now and it sort of shows how bad this can
get.
This node still has 109GB free disk by the way...
INFO [CompactionExecutor:5] 2011-06-16 09:11:59,164 StorageService.java
(line 2071) requesting GC to free disk space
INFO [CompactionExecutor:5] 2011-06-16
Hi,
I have been testing repairs a bit in different ways on 0.8.0 and I am
curious on what to really expect in terms of data transferred.
I would expect my data to be fairly consistent in this case from the start.
More than a billion supercolumns, but there was no errors in feed and we
have seen
Ah..
I just found Cassandra-2698 (I thought I had seen something about this)...
I guess that means I have too see if I can find time to investigate if I
have a reproducible case?
Terje
On Tue, Jun 14, 2011 at 4:21 PM, Terje Marthinussen tmarthinus...@gmail.com
wrote:
Hi,
I have been
That most likely happened just because after scrub you had new files and got
over the 4 file minimum limit.
https://issues.apache.org/jira/browse/CASSANDRA-2697
Is the bug report.
2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com
Hi All. I found a way to be able to compact. I have to
bug in the 0.8.0 release version.
Cassandra splits the sstables depending on size and tries to find (by
default) at least 4 files of similar size.
If it cannot find 4 files of similar size, it logs that message in 0.8.0.
You can try to reduce the minimum required files for compaction and it
12 sounds perfectly fine in this case.
4 buckets, 3 in each bucket, the minimum default threshold _per is 4.
Terje
2011/6/10 Héctor Izquierdo Seliva izquie...@strands.com
El vie, 10-06-2011 a las 20:21 +0900, Terje Marthinussen escribió:
bug in the 0.8.0 release version.
Cassandra
will affect on minor
compaction frequency, won't it?
maki
2011/6/10 Terje Marthinussen tmarthinus...@gmail.com:
bug in the 0.8.0 release version.
Cassandra splits the sstables depending on size and tries to find (by
default) at least 4 files of similar size.
If it cannot find 4 files of similar
If you run iostat without output every few second, is the I/O stable or do
you see very uneven I/O?
Regards,
Terje
On Tue, Jun 7, 2011 at 11:12 AM, aaron morton aa...@thelastpickle.comwrote:
There is a big IO queue and reads are spending a lot of time in the queue.
Some more questions:
-
(or
otherwise had the per-CF memtable settings applied?)
On Mon, Jun 6, 2011 at 12:00 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
0.8 under load may turn out to be more stable and well behaving than any
release so far
Been doing a few test runs stuffing more than 1 billion records
has 0 subcolumns in the first
place?
Is that expected behaviour?
Regards,
Terje
On Mon, Jun 6, 2011 at 10:09 PM, Terje Marthinussen tmarthinus...@gmail.com
wrote:
Of course I talked too soon.
I saw a corrupted commitlog some days back after killing cassandra and I
just came across
Yes, I am aware of it but it was not an alternative for this project which
will face production soon.
The patch I have is fairly non-intrusive (especially vs. 674) so I think it
can be interesting depending on how quickly 674 will be integrated into
cassandra releases.
I plan to take a closer
0.8 under load may turn out to be more stable and well behaving than any
release so far
Been doing a few test runs stuffing more than 1 billion records into a 12
node cluster and thing looks better than ever.
VM's stable and nice at 11GB. No data corruptions, dead nodes, full GC's or
any of the
Out of curiosity, could you try to disable mmap as well?
I had some problems here some time back and I wanted to see better what was
going on and disabled the mmap.
I actually don't think I have the same problem again, but I have seen javavm
sizes up in 30-40MB with a heap of just 16.
Haven't
Just out of curiosity is this on the receiver or sender side?
I have been wondering a bit if the hint playback could need some
adjustment.
There is potentially quite big differences on how much is sent per throttle
delay time depending on what your data looks like.
Early 0.7 releases also built
An if you have 10 nodes, do all of them happen to send hints to the two with
GC?
Terje
On Thu, May 12, 2011 at 6:10 PM, Terje Marthinussen tmarthinus...@gmail.com
wrote:
Just out of curiosity is this on the receiver or sender side?
I have been wondering a bit if the hint playback could need
Not sure I follow you. 4 sstables is the minimum compaction look for
(by default).
If there is 30 sstables of ~20MB sitting there because compaction is
behind, you
will compact those 30 sstables together (unless there is not enough space
for
that and considering you haven't changed the
On Wed, May 11, 2011 at 8:06 AM, aaron morton aa...@thelastpickle.comwrote:
For a reasonable large amount of use cases (for me, 2 out of 3 at the
moment) supercolumns will be units of data where the columns (attributes)
will never change by themselves or where the data does not change anyway
Hi,
If you make a supercolumn today, what you end up with is:
- short + Super Column name
- int (local deletion time)
- long (delete time)
Byte array of columns each with:
- short + column name
- int (TTL)
- int (local deletion time)
- long (timestamp)
- int + value of column
That
Anyway, to sum that up, expiring columns are 1 byte more and
non-expiring ones are 7 bytes
less. Not arguing, it's still fairly verbose, especially with tons of
very small columns.
Yes, you are right, sorry.
Trying to do one thing to many at the same time.
My brain filtered out part of the
Everyone may be well aware of that, but I'll still remark that a minor
compaction
will try to merge as many 20MB sstables as it can up to the max
compaction
threshold (which is configurable). So if you do accumulate some newly
created
sstable at some point in time, the next minor compaction
9, 2011 at 12:46 PM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
Yes, agreed.
I actually think cassandra has to.
And if you do not go down to that single file, how do you avoid getting
into a situation where you can very realistically end up with 4-5 big
sstables each having its own
Even with the current concurrent compactions, given a high speed datafeed,
compactions will obviously start lagging at some stage, and once it does,
things can turn bad in terms of disk usage and read performance.
I have not read the compaction code well, but if
job.
Terje
On Sat, May 7, 2011 at 9:54 PM, Jonathan Ellis jbel...@gmail.com wrote:
On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
1. Would it make sense to make full compactions occur a bit more
aggressive.
I'd rather reduce the performance impact
.
Terje
On Wed, May 4, 2011 at 6:34 AM, Terje Marthinussen
tmarthinus...@gmail.comwrote:
Hm... peculiar.
Post flush is not involved in compactions, right?
May 2nd
01:06 - Out of disk
01:51 - Starts a mix of major and minor compactions on different column
families
It then starts a few minor
the disk seems to have been full for 35 minutes due to un-deleted
sstables.
Terje
On Wed, May 4, 2011 at 6:34 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
Hm... peculiar.
Post flush is not involved in compactions, right?
May 2nd
01:06 - Out of disk
01:51 - Starts a mix of major
completely run out of disk space
Regards,
Terje
On Wed, May 4, 2011 at 10:09 PM, Jonathan Ellis jbel...@gmail.com wrote:
Or we could reserve space when starting a compaction.
On Wed, May 4, 2011 at 2:32 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
Partially, I guess this may
Cassandra 0.8 beta trunk from about 1 week ago:
Pool NameActive Pending Completed
ReadStage 0 0 5
RequestResponseStage 0 0 87129
MutationStage 0 0 187298
:
... and are there any exceptions in the log?
On Tue, May 3, 2011 at 1:01 PM, Jonathan Ellis jbel...@gmail.com wrote:
Does it resolve down to 0 eventually if you stop doing writes?
On Tue, May 3, 2011 at 12:56 PM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
Cassandra 0.8 beta trunk from about 1 week
So yes, there is currently some 200GB empty disk.
On Wed, May 4, 2011 at 3:20 AM, Terje Marthinussen
tmarthinus...@gmail.comwrote:
Just some very tiny amount of writes in the background here (some hints
spooled up on another node slowly coming in).
No new data.
I thought
catastrophically fail, its corresponding
post-flush task will be stuck.
On Tue, May 3, 2011 at 1:20 PM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
Just some very tiny amount of writes in the background here (some hints
spooled up on another node slowly coming in).
No new data.
I thought
debug logging and see if I get lucky and run out of
disk again.
Terje
On Wed, May 4, 2011 at 5:06 AM, Jonathan Ellis jbel...@gmail.com wrote:
Compaction does, but flush didn't until
https://issues.apache.org/jira/browse/CASSANDRA-2404
On Tue, May 3, 2011 at 2:26 PM, Terje Marthinussen
compaction activity as well.
(Also, if each of those pending mutations is 10,000 columns, you may
be causing yourself memory pressure as well.)
On Wed, Apr 27, 2011 at 11:01 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
0.8 trunk:
When playing back a fairly large chunk of hints, things
Hi,
I was testing the multithreaded compactions and with 2x6 cores (24 with HT)
it does seem a bit crazy with 24 compactions running concurrently.
It is probably not very good in terms of random I/O.
As such, I think I agree with the argument in 2191 that there should be a
config option for
PM, Sylvain Lebresne sylv...@datastax.comwrote:
On Tue, Apr 26, 2011 at 9:01 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
Hi,
I was testing the multithreaded compactions and with 2x6 cores (24 with
HT)
it does seem a bit crazy with 24 compactions running concurrently
I have been hunting similar looking corruptions, especially in the hints
column family, but I believe it occurs somewhere while compacting.
I looked in greater detail on one sstable and the row length was longer than
the actual data in the row, and as far as I could see, either the length was
heartbeats again and other nodes
log that they receive the heartbeats, but this will not get it marked
as UP again until restarted.
So, seems like 2 issues:
- Nodes pausing (may be just node overload)
- Nodes are not marked as UP unless restarted
Regards,
Terje
On 24 Apr 2011, at 23:24, Terje
Tested out multithreaded compaction in 0.8 last night.
We had first fed some data with compaction disabled so there was 1000+
sstables on the nodes and I decided to enable multithreaded compaction on
one of them to see how it performed vs. nodes that had no compaction at all.
Since this was sort
World as seen from .81 in the below ring
.81 Up Normal 85.55 GB8.33% Token(bytes[30])
.82 Down Normal 83.23 GB8.33% Token(bytes[313230])
.83 Up Normal 70.43 GB8.33% Token(bytes[313437])
.84 Up Normal 81.7 GB 8.33%
I think the really interesting part is how this node ended up in this state
in the first place.
There should be somewhere in the area of 340-500GB of data on it in when
everything is 100% compacted.
Problem now is that it used (we wiped it last night to test some 0.8 stuff)
more then 1TB.
To me,
a result, then you use read one, if you
want to get a highly available better quality result use local quorum.
That is a per-query option.
Adrian
On Tue, Apr 19, 2011 at 6:46 PM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
If you have RF=3 in both datacenters, it could be discussed
Hum...
Seems like it could be an idea in a case like this with a mode where result
is always returned (if possible), but where a flay saying if the consistency
level was met, or to what level it was met (number of nodes answering for
instance).?
Terje
On Tue, Apr 19, 2011 at 1:13 AM, Jonathan
at 11:16 PM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
Hum...
Seems like it could be an idea in a case like this with a mode where
result
is always returned (if possible), but where a flay saying if the
consistency
level was met, or to what level it was met (number of nodes
Hm...
You should notice that unless you have TRIM, which I don't think any OS
support with any raid functionality yet, then once you have written once to
the whole SSD, it is always full!
That is, when you delete a file, you don't clear the blocks on the SSD so
as far as the SSD goes, the data
I notice you have pending hinted handoffs?
Look for errors related to that. We have seen occasional corruptions in the
hinted handoff sstables,
If you are stressing the system to its limits, you may also consider playing
with more with the number of read/write threads (concurrent_reads/writes)
Seeing similar errors on another system (0.7.4). Maybe something bogus with
the hint columnfamilies.
Terje
On Mon, Mar 28, 2011 at 7:15 PM, Shotaro Kamio kamios...@gmail.com wrote:
I see. Then, I'll remove the HintsColumnFamily.
Because our cluster has a lot of data, running repair takes
it unless you see the thread pools backing up
and messages being dropped.
Hope that helps
Aaron
On 28 Mar 2011, at 19:55, Terje Marthinussen wrote:
Hi,
I was pondering about how the concurrent_read and write settings balances
towards max read/write threads in clients.
Lets say we have 3
Hi,
Should it be expected that secondary indexes are automatically regenerated
when importing data using json2sstable?
Or is there some manual procedure that needs to be done to generate them?
Regards,
Terje
I had similar errors in late 0.7.3 releases related to testing I did for the
mails with subject Argh: Data Corruption (LOST DATA) (0.7.0).
I do not see these corruptions or the above error anymore with 0.7.3 release
as long as the dataset is created from scratch. The patch (2104) mentioned
in the
, Benjamin Coverston
ben.covers...@datastax.com wrote:
Hi Terje,
Can you attach the portion of your logs that shows the exceptions
indicating corruption? Which version are you on right now?
Ben
On 3/4/11 10:42 AM, Terje Marthinussen wrote:
We are seeing various other messages as well
We are seeing various other messages as well related to deserialization, so
this seems to be some random corruption somewhere, but so far it may seem to
be limited to supercolumns.
Terje
On Sat, Mar 5, 2011 at 2:26 AM, Terje Marthinussen
tmarthinus...@gmail.comwrote:
Hi,
Did you get anywhere
I am suggesting that your probably want to rethink your scheme design
since partitioning by year is going to be bad performance since the
old servers are going to be nothing more then expensive tape drives.
You fail to see the obvious
It is just the fact that most of the data is stale
@Thibaut Britz
Caveat:Using simple strategy.
This works because cassandra scans data at startup and then serves
what it finds. For a join for example you can rsync all the data from
the node below/to the right of where the new node is joining. Then
join without bootstrap then cleanup both
Cassandra never compacts more than one column family at the time?
Regards,
Terje
On 26 Feb 2011, at 02:40, Robert Coli rc...@digg.com wrote:
On Fri, Feb 25, 2011 at 9:22 AM, A J s5a...@gmail.com wrote:
I read in some cassandra notes that each node should be allocated
twice the storage
Hi,
Given that you have have always increasing key values (timestamps) and never
delete and hardly ever overwrite data.
If you want to minimize work on rebalancing and statically assign (new)
token ranges to new nodes as you add them so they always get the latest
data
Lets say you add a new
Perfectly normal with 3-7x increase in data size depending on you data schema.
Regards,
Terje
On 20 Jan 2011, at 23:17, akshatbakli...@gmail.com akshatbakli...@gmail.com
wrote:
I just did a du -h DataDump which showed 40G
and du -h CassandraDataDump which showed 170G
am i doing something
Look at the graph again. Especially from the first posting.
The records/second read (by the client) goes down as disk reads goes down.
Looks like something is fishy with the memtables.
Terje
On Tue, Nov 23, 2010 at 1:54 AM, Edward Capriolo edlinuxg...@gmail.comwrote:
On Mon, Nov 22, 2010 at
How high is high and how much data do you have (Cassandra disk usage).
Regards,
Terje
On 4 Nov 2010, at 04:32, Alaa Zubaidi alaa.zuba...@pdf.com wrote:
Hi,
we have a continuous high throughput writes, read and delete, and we are
trying to find the best hardware.
Is using SSD for Cassandra
1000 and 1 records take too short time to really benchmark anything. You
will use 2 seconds just for stuff like tcp_windows sizes to adjust to the
level were you get throughput.
The difference between 100k and 500k is less than 10%. Could be anything.
Filesystem caches, sizes of memtables
, Terje Marthinussen
tmarthinus...@gmail.com wrote:
Another option would of course be to store a mapping between
dir/filenames
and Keyspace/columns familes together with other info related to
keyspaces
and column families. Just add API/command line tools to look up the
filenames and maybe
class, which includes the underscore
character.
Aaron
On 30 Aug 2010, at 21:01, Terje Marthinussen tmarthinus...@gmail.com
wrote:
Hi,
Now that we can make columns families on the fly, it gets interesting to
use
column families more as part of the data model (can reduce diskspace quite
On Mon, Aug 30, 2010 at 10:10 PM, Jonathan Ellis jbel...@gmail.com wrote:
column names are stored per cell
(moving to user@)
I think that is already accommodated for in my numbers?
What i listed was measured from the actual SSTable file (using the output
from strings sstable.db), so
Beyond aesthetics, specific reasons?
Terje
On Tue, Aug 31, 2010 at 11:54 AM, Benjamin Black b...@b3k.us wrote:
URL encoding.
http://twitter.com/nk/status/17903187277
Another not using joke?
86 matches
Mail list logo