Hi,
I have a 15-node cluster where each node has 4GB RAM and 80GB disk. There are
three CFs, of which only two contain data. In total, each CF contains about 2
billion columns. I have a replication factor of 2. All CFs are compressed with
SnappyCompressor. This is on Cassandra 1.0.2.
I was
Hi, first of all, let me say thank you for the the amazing product :-)
So, I have a couple of questions about internal physical data layout.
Suppose, I have the following data schema:
Reports:{
1:{
1:{value1:some val, value2:some val},
2:{value1:some val, value2:some val}
that's why disabling gossip + flush is better than drain. we should
probably remove it.
drain could be good if there is way to undrain node - to switch it back
into r/w.
Implement nodetool shutdown which will work like we are trying. First
stop gossip then wait for other nodes to see it
I found in stress tests that default setting this to 32 is way too high.
Hadoop guys are using value 10 during merge sorts to not stress IO that
much. I also discovered that filesystems like ZFS are using default io
queue size of 10 per drive.
I tried run tests with 10, 15 and 32 and there is
i consulted with hadoop expert and he told me that he is using value 100
for merging segments. I will rerun tests with 100 to check.
Hi Todd,
Entity Groups : https://issues.apache.org/jira/browse/CASSANDRA-1684
-Jake
On Wed, Nov 9, 2011 at 6:44 AM, Todd Burruss bburr...@expedia.com wrote:
I believe I heard someone talk at Cassandra SF conference about creating a
partitioner that was a derivation of RandomPartitioner. It
Hi,
I wonder if you have already discussed about ByteBuffer.allocateDirect
alternative to JNA memory allocation ?
If so, do someone mind send me a pointer ?
Thanks !
Benoit.
On Wed, Nov 9, 2011 at 1:28 AM, Patrik Modesto patrik.mode...@gmail.com wrote:
Hi,
on our production cluster of 8 nodes which is running cassandra 0.8.7
we still see in the MBean
org.apache.cassandra.db:type=StorageService.LoadMap in JMX
Management console the 9th node we added for testing
Not sure this is the standard approach, probably more what we came up
with. ;)
We plan to deploy Cassandra behind a firewall denying all traffic on all
ports other than 8080. Access from applications will be limited to the
REST/HTTP layer, which we'll lock down with standard HTTP authentication
Firewall with appropriate rules.
On Tue, Nov 8, 2011 at 6:30 PM, Guy Incognito dnd1...@gmail.com wrote:
hi,
is there a standard approach to securing cassandra eg within a corporate
network? at the moment in our dev environment, anybody with network
connectivity to the cluster can connect
We lockdown ssh to root from any network. We also provide individual
logins including sysadmin and they go through LDAP authentication.
Anyone who does sudo su as root gets logged and alerted via trapsend.
We use firewalls and also have a separate vlan for datastore servers.
We then open only
allocateDirect is broken for this purpose, but we removed the JNA
dependency using sun.misc.Unsafe instead:
https://issues.apache.org/jira/browse/CASSANDRA-3271
On Wed, Nov 9, 2011 at 5:54 AM, Benoit Perroud ben...@noisette.ch wrote:
Hi,
I wonder if you have already discussed about
I think this was already asked for, but you can add my vote for TTL
support for Counters.
On Tue, Nov 1, 2011 at 3:59 PM, Jonathan Ellis jbel...@gmail.com wrote:
Hi all,
Two years ago I asked for Cassandra use cases and feature requests.
[1] The results [2] have been extremely useful in
Thx jake for the JIRA, but there was someone at the conference that had already
implemented what I mentioned. It didn't offer any atomicity, just co-locating
a family of data on the same node.
From: Jake Luciani jak...@gmail.commailto:jak...@gmail.com
Reply-To:
I assume that Reports is the Super column family, the first 1: is the
report id and in the topology is the row key, that the second 1: is
the report line and in the Cassandra topology the super column, and
that value 1 is the column name. If this is not the case, maybe
explain the topology better.
Solandra does this
https://github.com/tjake/Solandra/blob/solandra/src/lucandra/dht/RandomPartitioner.java
But Row Groups is going to be the official way.
-Jake
On Wed, Nov 9, 2011 at 5:53 PM, Todd Burruss bburr...@expedia.com wrote:
Thx jake for the JIRA, but there was someone at the
[Changed subject to leave survey behind]
Cool, thx. Holding my breath for row groups :O
I see it is targeted for 1.1, is this realistic?
From: Jake Luciani jak...@gmail.commailto:jak...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
My wish list:
1) Conditional updates: if a column has a value then put column in the
column family atomically else fail.
2) getAndSet: on counters: a separate API
3) Revert the count when client disconnects or receives a exception (so
they can safely retry).
4) Something like a freeze API for
ok, thx for the input!
On 09/11/2011 15:19, Mohit Anchlia wrote:
We lockdown ssh to root from any network. We also provide individual
logins including sysadmin and they go through LDAP authentication.
Anyone who does sudo su as root gets logged and alerted via trapsend.
We use firewalls and
Thanks for the explanation, Konstantin. I'm a novice in the Cassandra
and not so familiar with the terminology.
You understood the topology well.
I had a quick look at the Cassandra source code and found that my
query from Hector is translated to a list of read commands(inside
CassandraServer).
Hi, Brian --
A little late to reply, but I'm slowly catching up.
You're going to be better off, IMHO, to pull the data out of Cassandra with a
tool like Pig (probably with a bit of aggregation and filtering) and then
operate on it in R as a static delimited file. If you need additional
When monitoring JMX metrics of cassandra 0.8.7 loaded by write-only
test
I observe significant read activity on column family where I write to.
It seems strange to me, but I expected no read activity on write-only
load. The read activity is caused by writes, as when I stop the write
test, reads
Indexed columns cause read before write so that the index can be updated
if the column already exists.
On 11/09/2011 02:46 PM, Oleg Tsernetsov wrote:
When monitoring JMX metrics of cassandra 0.8.7 loaded by write-only
test I observe significant read activity on column family where I
write to.
I am trying to install and run Cassandra 1.0 as a Windows Service on Windows
Server 2003 R2 x64. The installation seems to go OK, but when I try to start
the service I get the error Windows could not start the cassandra on Local
Computer. For more information, review the System Event Log. If
Compacted row maximum size: 36904729268
So 36 gigs. As long as you're sure each column is only about 1k, the
total row size should not be a problem.
While I don't see OOMs when I use only a single thread to page the row, there
are lots of ParNew collections that take about
Ah, you have two CF:s. And my mistake was that I accidentally treated
bits as bytes ;)
My calc is that the bloom filter sizes per node for you should be
about 1.8-1.9 GB. If you haven't touched heap size, IIRC the default
is still going to be 2GB for your 4 GB machine (not sure, please
confirm if
(You might be helped by
http://wiki.apache.org/cassandra/LargeDataSetConsiderations btw - it's
not entirely up to date by now... I will re-try remembering to update
it.)
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
handful of nodes that I write to with a CL of QUORUM (or there abouts).
If your goal is to service reads w/o waiting for remote servers, you
probably would want to use LOCAL_QUORUM (quorum within a data center)
or ONE for reads. That however assumes an RF of = 3 in each data
center (which means
If you run it as a window service it runs with system-rights (instead of
user-rights of logged in user).
IMHO this could help:
- Right click on service (administrative tools-services) and apply the user
rights which are wanted
- Maybe it helps if you allow exchange/interaction with
Hello,
I am going to need to move some nodes to rebalance my cluster. How safe is
this to do on a cluster with writes reads ?
Thanks
Hello, I'd like to get some ideas on how to model counting uniques with
cassandra.
My use-case is that I have various counters that I increment based on data
received from multiple devices. I'd like to be able to know if at least X
unique devices contributed to a counter value.
I've thought of the
Hi, sorry to ask again, but I'm having trouble getting to the bottom of this...
Does anyone else see this? When dynamic snitch is turned off, the performance
of LOCAL_QUORUM operations is as bad as QUORUM. The property file snitch
appears to be properly configured. Any suggestions on how I can
I did some test about that in the past months, and its safe if you have a
high replication factor on the cluster and high read consistency on the
clients.
But, if you have a big amount of data, it will get much time to rebalance
the nodes.
On Wed, Nov 9, 2011 at 9:07 PM, Philippe
2. With the same setup, after each period as defined by
dynamic_snitch_reset_interval_in_ms, the LOCAL_QUORUM performance greatly
degrades before drastically improving again within a minute.
This part sounds to me like one or more nodes in the cluster are
either broken and not responding at
I missed the news.
How the nodetool move work in recent version (0.8.x or later?)
Just stream appropriate range of data between nodes?
2011/11/10 Peter Schuller peter.schul...@infidyne.com:
Keep in mind that if you're using an older version of Cassandra a move
is actually a decommission
35 matches
Mail list logo