Is reducing the number of vnodes to 64/32 likely to help our situation?
with just 3 nodes per datacenter reduce vnodes to 1.
What options do I have for achieving this in a live cluster?
you need to remove node, move its data to other 2 and add it with
different vnodes count.
with 2 GB RAM be prepared to expect crashes because it hardly can handle
peaks with increased memory consumption by compaction, validation, etc.
KVM works good only if you are using recent version and virtio drivers
and provider is not overselling memory. At shared hosting you will not
be able
workable configuration depends on your requirements. You need to develop
own testing procedure.
How much data will have
whats 95 percentile response time target
size of rows
number of columns per row
data grow rate
data rewrite rate
ttl expiration used
never aim for minimum. Cassandra has huge
What would be the way to do this with cassandra?
embed app into server, use OSGi.
basic osgi integration is easy
you need to get osgi compatible container and hookup it to cassandra
daemon. Its very easy to do - about 5 lines.
osgi container can be accessed from network, you need to deploy your
application into container on each node and start it up. Then use some
RPC
Dne 25.7.2013 20:03, Andrew Cobley napsal(a):
Any idea on how I can go about pinpointing the problem to raise a JIRA issue ?
http://www.ehow.com/how_8705297_create-java-heap-dump.html
From my limited experience I think Cassandra is a dangerous choice for
an young limited funding/experience start-up expecting to scale fast.
Its not dangerous, just do not try to be smart and follow what other big
cassandra users like twitter, netflix, facebook, etc are using. If they
are
cas 2.0b2
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.0.0-beta2-tentative
and as a small startup time is our most valuable resource…
use technology you are most familiar with.
Dne 16.7.2013 20:45, Robert Coli napsal(a):
On Fri, Jul 12, 2013 at 2:28 AM, Radim Kolar h...@filez.com
mailto:h...@filez.com wrote:
with some very little work (less then 10 KB of code) is possible
to have online sstable splitter and exported this functionality
over JMX.
Are you
My understanding is that it is not possible to change the number of
tokens after the node has been initialized.
that was my conclusion too. vnodes currently do not brings any
noticeable benefits to outweight trouble. shuffle is very slow in large
cluster. Recovery is faster with vnodes but i
with some very little work (less then 10 KB of code) is possible to have
online sstable splitter and exported this functionality over JMX.
its possible to change num_tokens on node with data?
i changed it and restarted node but it still has same amount in nodetool
status.
Without manual flush the CPU goes mad after a couple of hours on each
instance.
increase heap size
OpsCenter collects anonymous usage data and reports it back to
DataStax. For example, number of nodes, keyspaces, column families,
etc. Stat reporting isn't required to run OpsCenter however. To turn
this feature off, see the docs here (stat_reporter):
You never informed user that installing
in case you do not know yet, opscenter is sending certain data about
your cassandra instalation back to datastax.
This fact is not visibly presented to user, its same spyware crap like
EHCache.
Dne 13.6.2013 8:19, Michal Michalski napsal(a):
It could be doable to do something when they get converted to
tombstone, but I don't think it's the use case you're looking for.
actually, this would be good enough for me
reading changelog for eclipse kepler (4.3) and BIRT has support for
creating reports from Cassandra
Could this error message be pointing at a proximate cause?
no
Dne 12.5.2013 2:28, Techy Teck napsal(a):
I am running Cassandra 1.2.2 in production. What kind of problems you
talking about? Might be I get some root cause why I am seeing bad read
performance with Astyanax client in production cluster.
no support for full cassandra 1.2 feature set
no/bad
Dne 11.5.2013 21:36, Techy Teck napsal(a):
Does anyone using Astyanax client in production mainly for reading
purpose?
with cassandra 1.1, it has problems with 1.2.
do not use cassandra for implementing queueing system with high
throughput. It does not scale because of tombstone management. Use
hornetQ, its amazingly fast broker but it has quite slow persistence if
you want to create queues significantly larger then your memory and use
selectors for
if dataset fits into memory and data used in test almost fits into
memory then cassandra is slow compared to other leading nosql databases,
it can go up to 10:1 ratio. Check infinispan benchmarks. Common use
pattern is to use memcached on top of cassandra.
cassandra is good if you have way
http://www.slideshare.net/Couchbase/benchmarking-couchbase#btnNext
apply patch + recompile.
Define max_sstable_size compaction strategy property on CF you want to split,
then run compaction.
from time to time people ask here for splitting large sstables, here is
patch doing that
https://issues.apache.org/jira/browse/CASSANDRA-4897
I would be careful with the patch that was referred to above, it
hasn't been reviewed, and from a glance it appears that it will cause
an infinite compaction loop if you get more than 4 SSTables at max size.
it will, you need to setup max sstable size correctly.
Dne 8.11.2012 19:12, B. Todd Burruss napsal(a):
my question is would leveled compaction help to get rid of the
tombstoned data faster than size tiered, and therefore reduce the disk
space usage?
leveled compaction will kill your performance. get patch from jira for
maximum sstable size per
Dne 29.10.2012 23:24, Stephen Pierce napsal(a):
I'm running 1.1.5; the bug says it's fixed in 1.0.9/1.1.0.
How can I check to see why it keeps running HintedHandoff?
you have tombstone is system.HintsColumnFamily use list command in
cassandra-cli to check
its possible to disable node wide all sstable compaction? I cant find
anything suitable in JMX console.
Dne 18.10.2012 20:06, Bryan Talbot napsal(a):
In a 4 node cluster running Cassandra 1.1.5 with sun jvm 1.6.0_29-b11
(64-bit), the nodes are often getting stuck in state where CMS
collections of the old space are constantly running.
you need more java heap memory
what if first node in range is down? then -pr would be ineffective
We have paid tool capable of downgrading cassandra 1.2, 1.1, 1.0, 0.8.
Are there any tested patches around for fixing this issue in 1.0 branch?
I have to do keyspace wide flush every 30 seconds to survive delete-only
workload. This is very inefficient.
https://issues.apache.org/jira/browse/CASSANDRA-3741
Repair process by itself is going well in a background, but the issue
I'm concerned is a lot of unnecessary compaction tasks
number in compaction tasks counter is over estimated. For example i have
1100 tasks left and if I will stop inserting data, all tasks will finish
within 30 minutes.
I
If you have steps to reproduce, post them here
https://issues.apache.org/jira/browse/CASSANDRA-4643
this is first version from 1.1 branch i used in pre-production stress
testing and i got lot of following errors: decorated key -1 != some number
INFO [CompactionExecutor:10] 2012-09-11 02:22:13,586
CompactionController.java (line 172) Compacting large row
i would migrate to 1.0 because 1.1 is highly unstable.
INFO [AntiEntropySessions:6] 2012-09-02 15:46:23,022
AntiEntropyService.java (line 663) [repair #%s] No neighbors to repair
with on range %s: session completed
you have RF=1, or too many nodes are down.
You looking for the author of Spring Data Cassandra?
https://github.com/boneill42/spring-data-cassandra
If so, I guess that is me. =)
Did you get in touch with spring guys? They have cassandra support on
their spring data todo list. They might have some todo or feature list
they want to
is author of Spring - Cassandra here? I am interested in getting this
merged into upstream spring. They have cassandra support on their todo
list.
Dne 25.5.2012 2:41, Edward Capriolo napsal(a):
Also it does not sound like you have run anti entropy repair. You
should do that when upping rf.
i run entropy repairs and it still does not fix counters. I have some
reports from users with same problem but nobody discovered repeatable
I was thinking about putting both the commit log and the data
directory on a software raid partition spanning over the two disks.
Would this increase the general read performance? In theory I could
get twice the read performance, but I don't know how the commit log
will influence the read
are there ubuntu packages?
1) I assume that I have to call the loadNewSSTables() on each node?
this is same as nodetool refresh?
is upgradesstables required upon update column family with
compression_options (or compaction_strategy) ?
for compaction strategy no, not use about other one.
Dne 19.7.2012 15:07, cbert...@libero.it napsal(a):
Hi all, I have a problem with counters I'd like to solve before going in
production.
I have also similar problem with counters, but i do no think that
something can be done with it. Developers are not interested in
discovering what is wrong
i do not have experience with other clients, only hector. But timeout
management in hector is really broken. If you expect your nodes to
timeout often (for example, if you are using WAN) better to try
something else first.
Dne 13.6.2012 11:29, Viktor Jevdokimov napsal(a):
I remember that join and decommission didn’t worked since using
streaming. All problems was due to paths differences between Windows
and Linux styles.
what about to use unix style File.separator in streaming protocol to
make it
do not delete empty rows. It refreshes tombstone and they will never expire.
Dne 26.3.2012 19:17, aaron morton napsal(a):
Can you describe the situations where counter updates are lost or go
backwards ?
Do you ever get TimedOutExceptions when performing counter updates ?
we got few timeouts per day but not much, less then 10. I do not think
that timeouts will be root
Dne 19.5.2012 0:09, Gurpreet Singh napsal(a):
Thanks Radim.
Radim, actually 100 reads per second is achievable even with 2 disks.
it will become worse as rows will get fragmented.
But achieving them with a really low avg latency per key is the issue.
I am wondering if anyone has played with
to get 100 random reads per second on large dataset (100 GB) you need
more disks in raid 0 then 2.
Better is to add more nodes then stick too much disks into node. You
need also adjust io scheduler in OS.
Try reducing memtable_total_space_in_mb config setting. If the problem
is incorrect memory metering that should help.
it does not helps much because difference in correct and cassandra
assumed calculation is way too high. It would require me to shrink
memtables to about 10% of their correct
here is part of log. actually record is 419.
ponto:(admin)log/cassandragrep to maximum of 64 system.log.1
WARN [MemoryMeter:1] 2012-02-03 00:00:19,444 Memtable.java (line 181)
setting live ratio to maximum of 64 instead of 64.9096047648211
WARN [MemoryMeter:1] 2012-02-08 00:00:17,379
There is 2T data on each server. Can someone give me some advice?
do not do it
liveratio calculation logic also needs to be changed because it is based
on assumption that workloads do not change.
Can you give an example of the sort of workload change you are
thinking of ?
i have 3 workload types running in batch. Delete only workload, insert
only and heavy update (lot
Are you experiencing memory pressure you think may be attributed to
memtables not being flushed frequently enough ?
yes
especially delete workload is really good for OOM cassandra for some reason.
Is Cassandra a fit for this use-case or should we just stick with the
oldskool MySQL and put things like votes, reviews etc in our C* store?
If all your data fits into one computer and you expect only tens of
millions records in table then go for SQL. It has far more features and
people are
liveratio calc should do nothing if memtable has 0 columns. I did manual
flush before this.
WARN [MemoryMeter:1] 2012-05-10 13:21:19,430 Memtable.java (line 181)
setting live ratio to maximum of 64 instead of Infinity
INFO [MemoryMeter:1] 2012-05-10 13:21:19,431 Memtable.java (line 186)
Dne 18.4.2012 16:22, Jonathan Ellis napsal(a):
It's not that simple, unless you have an append-only workload.
I have append only workload and probably most ppl using TTL too.
Any compaction pass over A will first convert the TTL data into tombstones.
Then, any subsequent pass that includes A *and all other sstables
containing rows with the same key* will drop the tombstones.
thats why i proposed to attach TTL to entire CF. Tombstones would not be
needed
Dne 4.4.2012 6:52, Igor napsal(a):
Here is small python script I run once per day. You have to adjust
size and/or age limits in the 'if' operator. Also I use mx4j interface
for jmx calls.
forceUserDefinedCompaction would be more usefull if you could do
compaction on 2 tables. If i run it on
what is method for undo effect of CASSANDRA-3989 (too many unnecessary
levels)? running major compact or cleanup does nothing.
What OS are you using
FreeBSD 8.3 64 bit PRERELEASE
Will 1500 bytes row size be large or small for Cassandra from your
understanding?
performance degradation starts at 500MB rows, its very slow if you hit
this limit.
it would be really helpfull if leveled compaction prints level into syslog.
Demo:
INFO [CompactionExecutor:891] 2012-04-05 22:39:27,043
CompactionTask.java (line 113) Compacting ***LEVEL 1***
[SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19690-Data.db'),
Would you, please share, what filesystem you are using?
zfs 28
there is problem with size tiered compaction design. It compacts
together tables of similar size.
sometimes it might happen that you will have some sstables sitting on
disk forever (Feb 23) because no other similar sized tables were created
and probably never be. because flushed sstable is
Dne 3.4.2012 23:04, i...@4friends.od.ua napsal(a):
if you know for sure that you will free lot of space compacting some
old table, then you can call UserdefinedCompaction for this table(you
can do this from cron). There is also a ticket in jira with discussion
on per-sstable expierd column
RAID0 would help me use more efficiently the total disk space available at each
node, but tests have shown that under write load it behaves much worse than
using separate data dirs, one per disk.
there are different strategies how RAID0 splits reads, also changing io
scheduler and filesystem
Dne 27.3.2012 11:13, Ross Black napsal(a):
Any pointers on what I should be looking for in our application that
would be stopping the deletion of tombstones?
do not delete already deleted rows. On read cassandra returns deleted
rows as empty in range slices.
How can I fix this?
add more data. 1.5M is not enough to get reliable reports
Scenario 4
T1 write column
T2 Flush memtable to S1
T3 del row
T4 flush memtable to S5
T5 tomstone S5 expires
T6 S5 is compacted but not with S1
Result?
Dne 26.3.2012 3:39, aaron morton napsal(a):
Can you please reproduce the fault using the --debug cqlsh command
option and bug report here
https://issues.apache.org/jira/browse/CASSANDRA
https://issues.apache.org/jira/browse/CASSANDRA-4083
I was wrong, it fails on first nontombstoned row.
During compaction of selected sstables Cassandra checks the whole Column
Family for the latest timestamp of the column/row, including other
sstables and memtable.
You are explaining that if i have expired row tombstone and there exists
later timestamp on this row that tombstone is not
I wonder why are memtable estimations so bad.
1. its not possible to run them more often? There should be some limit -
run live/serialized calculation at least once per hour. They took just
few seconds.
2. Why not use data from FlusherWriter to update estimations? Flusher
knows number of ops
Example:
T1 T2 T3
at T1 write column
at T2 delete row
at T3 tombstone expiration do compact ( T1 + T2 ) and drop expired
tombstone
column from T1 will be alive again?
I suspect that running cluster wide repair interferes with TTL based
expiration. I am running repair every 7 days and using TTL expiration
time 7 days too. Data are never deleted.
Stored data in cassandra are always growing (watching them for 3 months)
but they should not. If i run manual
Dne 19.3.2012 20:28, i...@4friends.od.ua napsal(a):
Hello
Datasize should decrease during minor compactions. Check logs for
compactions results.
they do but not as much as i expect. Look at sizes and file dates:
-rw-r--r-- 1 root wheel 5.4G Feb 23 17:03 resultcache-hc-27045-Data.db
Dne 19.3.2012 21:46, Caleb Rackliffe napsal(a):
I've been wondering about this too, but every column has both a
timestamp /and/ a TTL. Unless the timestamp is not preserved, there
should be no need to adjust the TTL, assuming the expiration time is
determined from these two variables.
Dne 19.3.2012 23:33, ruslan usifov napsal(a):
Do you make major compaction??
no, i do cleanups only. Major compactions kills my node with OOM.
Dne 2.3.2012 9:49, Maki Watanabe napsal(a):
Fixed in 1.0?
https://issues.apache.org/jira/browse/CASSANDRA-3176
that patch test if sstable is empty before continuing HH delivery but in
my case table is not empty - it contains one tombstoned row.
If you need to dump a lot of data consider the Hadoop integration.
http://wiki.apache.org/cassandra/HadoopSupport It can run a bit faster
than going through the thrift api.
cassandra hadoop integration reads sstables directly instead of going
via thrift?
Dne 2.3.2012 13:24, Watanabe Maki napsal(a):
How about to truncate HintsColumnFamily and then execute nodetool repair as
work around?
i got this exception in CLI. Its weird, all my nodes are UP and no
exception message in server log.
[default@unknown] use system;
Authenticated to keyspace:
Can be something made to remove these empty delivery attempts from log?
Its just tombstoned row.
[default@system] list HintsColumnFamily;
Using default limit of 100
---
RowKey: 00
1 Row Returned.
Elapsed time: 234 msec(s).
INFO [HintedHandoff:1] 2012-03-02 05:44:32,359
if a node goes down, it will take longer for commitlog replay.
commit log replay time is insignificant. most time during node startup
is wasted on index sampling. Index sampling here runs for about 15 minutes.
Are there plans to write partitioner based on faster hash alg. instead
of MD5? I did cassandra profiling and lot of time is spent inside MD5
function.
Dne 3.2.2012 17:46, Jonathan Ellis napsal(a):
You should come up with a way to reproduce so we can fix it. :)
it happens after HH delivery when memtable contains lot of deletes
but a ration of 1 may occur
for column families with a very high update to insert ratio.
better to ask why minimum ratio is 1.0. What harm can be done with using
1.0 ratio?
Dne 26.1.2012 2:32, David Carlton napsal(a):
How stable is 1.0 these days?
good. but hector 1.0 is unstable.
Anyway, I can't find any reason to limit minimum value of
phi_convict_threshold to 5. maki
In real world you often want to have 9 because cassandra is too much
sensitive to overloaded LAN and nodes are flipping up/down often and
creating chaos in cluster if you have larger number of nodes (let
Its technically possible to have without breaking basic levelDB
algorithm configurable sstable size and count on different levels?
something like:
level 1 - 10 x 50 MB tables
level 2 - 60 x 40 MB tables
level 3 - 150 x 30 MB tables
I am interested in more deeper leveldb research, because
Then for each read, Cassandra will go through all the SSTables (or
one SSTable in each level for the leveled compaction strategy)? How to deal
with this
problem?
bloom filters can guess right sstables to be read with high probability
0.1%. In reality even if you are using size based
currently j.o.a.c.io.sstable.indexsummary is implemented as ArrayList of
KeyPosition (RowPosition key, long offset)
i propose to change it to
RowPosition keys[]
long offsets[]
this will lower number of java objects used per entry from 2
(KeyPosition + RowPosition) to 1.
For building these
I don't know what you are basing that on. It seems unlikely to me that
the working set of a compaction is 600 MB. However, it may very well
be that the allocation rate is such that it contributes to an
additional 600 MB average heap usage after a CMS phase has completed.
I will investigate
That is a good reason for both to be configurable IMO.
index sampling is currently configurable only per node, it would be
better to have it per Keyspace because we are using OLTP like and OLAP
keyspaces in same cluster. OLAP Keyspaces has about 1000x more rows.
But its difficult to estimate
makes me feel disappointed about consistency in Cassandra, but I wonder is
there is a way to work around it.
cassandra is not suitable for this kind of programs. CouchDB is slightly
better, it has transactions but no locking and i am not sure if
transaction isolation is supported now. mongodb
But is there any way of implementing minimum required ACID subset on
top of Cassandra?
try this, its nosql ACID compliant. I haven't tested this, it will have
most likely pretty slow writes and lot of bugs like any other oracle
application.
But just to be extra clear: Data will not actually be removed once the
row in question participates in compaction. Compactions will not be
actively triggered by Cassandra for tombstone processing reasons.
leveled compaction is really good for this because it compacts often
demo, it will be in cassandra 1.0.7
standard cassa bloom filter
-rw-r--r-- 1 root wheel 19307376721 Dec 27 20:06 sipdb-hc-4634-Data.db
-rw-r--r-- 1 root wheel 63 Dec 27 20:06
sipdb-hc-4634-Digest.sha1
-rw-r--r-- 1 root wheel770714896 Dec 27 20:06 sipdb-hc-4634-Filter.db
1 - 100 of 190 matches
Mail list logo