I am not sure of the new default is to use compression, but I do not
believe compression is a good default. I find compression is better for
larger column families that are sparsely read. For high throughput CF's I
feel that decompressing larger blocks hurts performance more then
compression adds.
compression here is that my data lends itself
well to it due to the composite columns. My current compression ratio is
30.5%. Not sure it matters but my BF false positive ration os 0.048.
From: Edward Capriolo edlinuxg...@gmail.com
Reply-To: user@cassandra.apache.org user
I was going to say something similar I feel like the SSD drives read much
more then the standard drive. Read Ahead/arge sectors could and probably
does explain it.
On Thu, May 16, 2013 at 3:43 PM, Bryan Talbot btal...@aeriagames.comwrote:
512 sectors for read-ahead. Are your new fancy SSD
This makes sense. Unless you are running major compaction a delete could
only happen if the bloom filters confirmed the row was not in the sstables
not being compacted. If your rows are wide the odds are that they are in
most/all sstables and then finally removing them would be tricky.
On Thu,
on 2048 sector boundary
5. use ext4 with noatime,nodiratime,discard mount options
On 05/16/2013 10:48 PM, Edward Capriolo wrote:
I was going to say something similar I feel like the SSD drives read much
more then the standard drive. Read Ahead/arge sectors could and probably
does explain
Please give an example of the code you are trying to execute.
On Thu, May 16, 2013 at 6:26 PM, Everton Lima peitin.inu...@gmail.comwrote:
But the problem is that I would like to use Cassandra embeeded? This is
not possible any more?
2013/5/15 Edward Capriolo edlinuxg...@gmail.com
You
I have actually tested repair in many interesting scenarios:
Once I joined a node and forgot autobootstrap=true
So the data looked like this in the ring
left node
8GB
new node
0GB
right node
8GB
After repair
left node
10 GB
new node
13 gb
right node
12 gb
We do not run repair at all. It is
If you are using hector it can setup the embedded server properly.
When using the server directly inside cassandra I have run into a similar
problem..
https://github.com/edwardcapriolo/cassandra/blob/range-tombstone-thrift/test/unit/org/apache/cassandra/thrift/EndToEndTest.java
@BeforeClass
http://basho.com/introducing-riak-1-3/
Introduced Active Anti-Entropy. Riak now has active anti-entropy. In
distributed systems, inconsistencies can arise between replicas due to
failure modes, concurrent updates, and physical data loss or corruption.
Pre-1.3 Riak already had several features for
:
But using this code:
ThriftSessionManager.instance.setCurrentSocket(new
InetSocketAddress(9160));
I will need to execute this line every time that I need to do somiething
in Cassandra? Like update a collunm family.
Thanks for reply.
2013/5/15 Edward Capriolo edlinuxg...@gmail.com
Your not supposed to skip minor versions on upgrade.
1.0 - 1.2 = bad
1.0 - 1.1 - 1.2 = good
On Tue, May 14, 2013 at 3:07 AM, Roshan codeva...@gmail.com wrote:
While upgrading from 1.0.11 to 1.2.4, I saw this exception in my log.
2013-05-14 10:50:10,291 ERROR [CassandraDaemon] Exception in
protocol
but Thrift, doesn't it? In that case there may be other limits.
T#
On Mon, May 13, 2013 at 3:26 AM, Edward Capriolo edlinuxg...@gmail.comwrote:
2 billion is the maximum theoretically limit of columns under a row. It
is NOT the maximum limit of a CQL collection. The design of CQL
with
it that will finish in less then the rpc_timeout. In a single request I
personally would not try more then 10K columns.
On Mon, May 13, 2013 at 2:08 PM, Robert Coli rc...@eventbrite.com wrote:
On Sun, May 12, 2013 at 6:26 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:
2 billion
Being token aware makes a big performance difference. We do that internally
with out client and it means a lot for 95 percentile time. If Astynax is
not vnode token aware and your using them you could see worse performance.
A long time beef with the client libraries is that they are always
2 billion is the maximum theoretically limit of columns under a row. It is
NOT the maximum limit of a CQL collection. The design of CQL collections
currently require retrieving the entire collection on read.
On Sun, May 12, 2013 at 11:13 AM, Robert Wille rwi...@footnote.com wrote:
I designed a
If you use your off heap memory linux has an OOM killer, that will kill a
random tasks.
On Fri, May 10, 2013 at 11:34 AM, Bryan Talbot btal...@aeriagames.comwrote:
If off-heap memory (for indes samples, bloom filters, row caches, key
caches, etc) is exhausted, will cassandra experience a
http://www.datastax.com/docs/1.1/references/nodetool#nodetool-getendpoints
This tells you where a key lives. (you need to hex encode the key)
On Wed, May 8, 2013 at 5:14 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
nodetool describering {keyspace}
From: Kanwar Sangha
I am aware of no benchmark that shows the binary driver to be faster then
thrift. Yes. Theoretically a driver that with multiplex *should be* faster
in *some* cases. However I have never seen any evidence to back up this
theory anecdotal or otherwise.
In fact
I did not know the system tables were compressed. That would seem like an
odd decision you would think that the system tables are small and would not
benefit from compression much. Is it a static object static object that
requires initialization even though it is not used?
On Fri, May 3, 2013 at
Out of curiosity. Why did you decide to set it to 0 rather then 9. Does
any documentation anywhere say that setting to 0 disables the feature? I
have set streamthroughput higher and seen node join improvements. The
features do work however they are probably not your limiting factor.
Remember
I have noticed the same. I think in the real world your compaction
throughput is limited by other things. If I had to speculate I would say
that compaction can remove expired tombstones, however doing this requires
bloom filter checks, etc.
I think that setting is more important with multi
Thrift has a prepare_cql call which returns an ID. Then it has an
exececute_cql call which takes the id and a map or variable bindings.
On Tue, Apr 23, 2013 at 10:29 AM, Stuart Broad stu...@moogsoft.com wrote:
Hi all,
I just realised that the binary protocol is the low-level thrift api that
at 4:05 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
Thrift has a prepare_cql call which returns an ID. Then it has an
exececute_cql call which takes the id and a map or variable bindings.
On Tue, Apr 23, 2013 at 10:29 AM, Stuart Broad stu...@moogsoft.comwrote:
Hi all,
I just realised
). I don't understand what switching to binary protocol
but not using thrift means. Can you point me to any code examples?
Regards,
Stuart
On Tue, Apr 23, 2013 at 4:21 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
Having to catch the exception and parse it is a bit ugly, however
You can get the topology info from thrift's describe_ring. But you can not
get host/gossip status through thrift. It does make sense as something to
add. In the native protocol and with the fat client(storage proxy) you can
hook into these events.
An example of this is here (fat client):
three things:
1) compaction throughput is fairly low (yaml nodetool)
2) concurrent compactions is fairly low (yaml)
3) multithreaded compaction might be off in your version
Try raising these things. Otherwise consider option 4.
4)$$$ RAID,RAMCPU$$
On Wed, Apr
If you are using a two node cassandra cluster locally use ccm, it builds
all the configuration files for you.
https://github.com/pcmanus/ccm
On Tue, Apr 16, 2013 at 11:06 AM, Alicia Leong lccali...@gmail.com wrote:
*Node2(ip2):*
initial_token: 85070591730234615865843651857942052864
So cassandra does inter node compression. I have not checked but this might
be accidentally getting turned on by default. Because the storage port is
typically 7000. Not sure why you are allowing 7100. In any case try
allowing 7000 or with internode compression off.
On Tue, Apr 16, 2013 at 6:42
You can 'list' or 'select *' the column family and you get them in a pseudo
random order. When you say subset it implies you might want a specific
range which is something this schema can not do.
On Sat, Apr 13, 2013 at 2:05 AM, Gareth Collins
gareth.o.coll...@gmail.comwrote:
Hello,
If I
Your best bet is to switch to RandomPartitioner. Otherwise you have to
patch or wait until astynax catches up.
On Fri, Apr 12, 2013 at 9:42 AM, Keith Wright kwri...@nanigans.com wrote:
Hi all,
I am trying to use Astyanax 1.56.37 to connect to C* 1.2.3 using
murmur3 and Vnodes and I am
You are correct. In CQL the timestamps come from the server unless
specified. In thrift the user must supply, otherwise it is always 0.
On Fri, Apr 12, 2013 at 9:20 AM, Michael Theroux mthero...@yahoo.comwrote:
Hello,
We are having an odd sporadic issue that I believe maybe due to time
The YCSB client is not very advanced. Hector, asynax or the native driver
will work better. There are a few ycsb forks as each nosql person usually
needs to fork ycsb to get the most out of it, check github.
On Thu, Apr 11, 2013 at 6:28 PM, Rodrigo Felix
rodrigofelixdealme...@gmail.com wrote:
This issue describes the design of the arena allocation of memtabes.
https://issues.apache.org/jira/browse/CASSANDRA-2252
On Fri, Apr 12, 2013 at 1:35 AM, Viktor Jevdokimov
viktor.jevdoki...@adform.com wrote:
Memtables resides in heap, write rate impacts GC, more writes - more
frequent and
Duel core not the greatest you might run into GC issues before you run out
of IO from your ssd devices. Also cassandra has other concurrency settings
that are tuned roughly around the number of processors/cores. It is not
uncommon to see 4-6 cores of cpu (600 % in top dealing with young gen
If you do not have JNA truncate has to fork an 'ln -s'' command for the
snapshots. I think that makes it un-predicatable. Truncate has its own
timeout value now (separate from the other timeouts). If possible I think
it is better to make each test use it's own CF and avoid truncate entirely.
On
With that much data per node you have to raise the IndexInterval and adjust
the bloom filter settings. Although the bloom filters are off heap now
having that much data can but a strain on physical memory.
On Thu, Apr 11, 2013 at 4:26 PM, aaron morton aa...@thelastpickle.comwrote:
The data
Stables loader was slow in 1:0:x I had better luck with rsync. It was not
fixed in the 1.0.x series.
On Wednesday, April 10, 2013, Viktor Jevdokimov
viktor.jevdoki...@adform.com wrote:
Found https://issues.apache.org/jira/browse/CASSANDRA-3668
Weird.
Best regards / Pagarbiai
Viktor
:
Rsync is not for our case.
Is sstableloader for 1.2.x faster?
From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Wednesday, April 10, 2013 15:52
To: user@cassandra.apache.org
Subject: Re: sstableloader throughput
Stables loader was slow in 1:0:x I had better luck with rsync
Maybe you should enable the wide row support that uses get_paged_slice
instead of get_range_slice and possibly will not have the same issue.
On Wed, Apr 10, 2013 at 7:29 PM, Lanny Ripple la...@spotright.com wrote:
We are using Astyanax in production but I cut back to just Hadoop and
Cassandra
is
always up or down for less than max_hint_window_in_ms, right ?
--
Cyril SCETBON
On Apr 5, 2013, at 11:59 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:
There are a series of edge cases that dictate the need for repair. The
largest cases are 1) lost deletes 2) random disk corruptions
been
effected.
On Sun, Apr 7, 2013 at 4:56 AM, Arya Goudarzi gouda...@gmail.com wrote:
Yes, I know blowing them away would fix it and that is what I did, but I
want to understand why this happens in first place. I was upgrading from
1.1.10 to 1.2.3
On Fri, Apr 5, 2013 at 2:53 PM, Edward
I am not familiar with shuffle, but if you attempt a shuffle and it fails
if would be a good idea to let compaction die down, or even trigger major
compaction on the nodes where the size grew. The reason is because once the
data files are on disk, even if they are duplicates, cassandra does not
For #2
There are tow mutates in thrift batch_mutate and atomic_batch_mutate. The
atomic version was just added. If you care more about the performance do
not use the atomic version..
On Sat, Apr 6, 2013 at 12:03 AM, Matt K infinitelimittes...@gmail.comwrote:
Hi,
I have an application that
On Apr 4, 2013, at 4:20 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
Your reverse index of which rows contain a column named X will have
very wide rows. You could look at cassandra's secondary indexing, or
possibly look at a solandra/solr approach. Another option is you can shift
the problem
This has happened before the save caches files were not compatible between
0.6 and 0.7. I have ran into this a couple other times before. The good
news is the save key cache is just an optimization, you can blow it away
and it is not usually a big deal.
On Fri, Apr 5, 2013 at 2:55 PM, Arya
There are a series of edge cases that dictate the need for repair. The
largest cases are 1) lost deletes 2) random disk corruptions
In our use case we only delete entire row keys, and if the row key comes
back it is not actually a problem because our software will find it an
delete it again. In
One would think, but remember only like sized sstables compact. You want
more files roughlt the same size rather then few big ones in most cases,
but there are no hard fast rules.
On Thu, Apr 4, 2013 at 11:36 AM, Peter Haggerty
peter.hagge...@librato.comwrote:
The default minthreshold for
You can not get only the column name (which you are calling a key) you can
use get_range_slice which returns all the columns. When you specify an
empty byte array (new byte[0]{}) as the start and finish you get back all
the columns. From there you can return only the columns to the user in a
, Edward Capriolo edlinuxg...@gmail.com
wrote:
You can not get only the column name (which you are calling a key) you can
use get_range_slice which returns all the columns. When you specify an
empty byte array (new byte[0]{}) as the start and finish you get back all
the columns. From there you
Counters are currently read before write, some collection operations on
List are read before write.
On Wed, Apr 3, 2013 at 9:59 PM, aaron morton aa...@thelastpickle.comwrote:
I would guess not.
I know this goes against keeping updates idempotent,
There are also issues with consistency.
, Feb 6, 2013 at 12:56 PM, Wei Zhu wz1...@yahoo.com wrote:
Anyone has first hand experience with Zing JVM which is claimed to be
pauseless? How do they charge, per CPU?
Thanks
-Wei
--
*From:* Edward Capriolo edlinuxg...@gmail.com
*To:* user@cassandra.apache.org
Settings do not make compactions go away. If your compactions are out of
control it usually means one of these things,
1) you have a corrupt table that the compaction never finishes on,
sstables count keep growing
2) you do not have enough hardware to handle your write load
On Tue, Apr 2, 2013
Technically it should work a mix of hsha and the other option. I tried a
mix/match as and I noticed some clients were not happy and some other odd
stuff, but I could not tie it down to the setting because thrift from the
cli was working for me.
On Sun, Mar 31, 2013 at 6:30 AM, aaron morton
Every map reduce task typically has a minimum Xmx of 256MB memory. See
mapred.child.java.opts...
So if you have a 10 node cluster with 256 vnodes... You will need to spawn
2,560 map tasks to complete a job.
And a 10 node hadoop cluster with 5 map slotes a node... You have 50 map
slots.
Wouldnt it
This is the second person who has mentioned that hadoop performance has
tanked after switching to vnodes on list.
On Fri, Mar 29, 2013 at 10:42 AM, Edward Capriolo edlinuxg...@gmail.comwrote:
Every map reduce task typically has a minimum Xmx of 256MB memory. See
mapred.child.java.opts...
So
, Mar 29, 2013 at 2:17 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
Yes but my point, is with 50 map slots you can only be processing 50 at
once. So it will take 1000/50 waves of mappers to complete the job.
On Fri, Mar 29, 2013 at 11:46 AM, Jonathan Ellis jbel...@gmail.comwrote:
My point
You can use the output of describe_ring along with partitioner information
to determine which nodes data lives on.
On Fri, Mar 29, 2013 at 12:33 PM, Alicia Leong lccali...@gmail.com wrote:
Hi All
I’m thinking to do in this way.
1) 1) get_slice ( MMDDHH ) from Index Table.
2)
yes. The input format is making split per vnodes it can be optimized likely.
On Thu, Mar 28, 2013 at 9:30 AM, Alicia Leong lccali...@gmail.com wrote:
Hi All,
I have 3 nodes of Cassandra 1.2.3 edited the cassandra.yaml for vnodes.
When I execute a M/R job .. the console showed HUNDRED of
The sticks were given to me in a bag at 9:00am in the morning. I would be
highly impressed if datastax found a way to retroactively put my 2:00PM
presentation on that USB stick.
http://www.imdb.com/title/tt0088763/
:)
On Mon, Mar 25, 2013 at 11:49 AM, Brian O'Neill b...@alumni.brown.eduwrote:
The value that sorts higher, this way it is deterministic.
On Sat, Mar 23, 2013 at 12:12 PM, dong.yajun dongt...@gmail.com wrote:
Hello,
I would like to know which write wins in case of two updates with the
same client timestamp in Cassandra.
Initial data: KeyA: { col1:val AA, col2:val BB,
Imho it is probably more efficient for wide. When you decompress 8k blocks
to get at a 200 byte row you create overhead , particularly young gen.
On Monday, March 18, 2013, Sylvain Lebresne sylv...@datastax.com wrote:
The way compression is implemented, it is oblivious to the CF being
wide-row or
/2012/08/C2012-Hastur-NoahGibbs.pdf
-- Drew
On Mar 18, 2013, at 6:14 AM, Edward Capriolo edlinuxg...@gmail.com
wrote:
Imho it is probably more efficient for wide. When you decompress 8k blocks
to get at a 200 byte row you create overhead , particularly young gen.
On Monday, March 18, 2013
You really can not control what the OS-swaps out. java has other memory
usage outside the heap, and native memory. best to turn swap off. Swap is
kinda old school anyway at this point. It made sense when machines had 32MB
RAM.
Keeping your read 95th percentile low is mostly about removing
yes LCS has its own compacting thing. It does not honor min compaction, max
compaction, and no-ops major compaction. The issue is that at the moment
you change your system moves all your sized data to L0 and then starts a
huge compaction grid to level it.
It would be great to just make this
Caused by: java.io.FileNotFoundException: /var/lib/cassandra/commitlog/
CommitLog-2-1363017061553.log (Permission denied)
^ It seems like your running cassandra as a user thatdoes not have access
to this directory. Possibly you can something as root and now the files
are root owned at one point
-calculate.
Use the org.apache.cassandra.db:type=DynamicEndpointSnitch MBean to see
what scores it has given the nodes.
Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 8/03/2013, at 11:40 AM, Edward Capriolo
In some cases you can do this using the broadcast address, which is
different then the listen and rpc address. But if nothing is route-able ie
NAT I do not think it is possible.
On Sun, Mar 10, 2013 at 10:05 AM, Илья Шипицин chipits...@gmail.com wrote:
Hello!
Is it possible to run cluster in
dynamic_snitch=true is the default. So it is usually on wrapping other
snitches. I have found several scenarios where it does not work exactly as
your would expect.
On Fri, Mar 8, 2013 at 2:26 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
Our test setup
4 nodes, RF=3, reads at CL=QUOROM and we
It was found out that that having no bloom filter is a bad idea because it
causes issues where deleted rows are never removed from disk. Newer
versions have fixed this. You should adjust your bloom filter settings to
be 0 sized.
On Thu, Mar 7, 2013 at 4:18 PM, Michael Theroux mthero...@yahoo.com
I read that the change was made because Cassandra does not work well when
they are off. This makes sense because cassandra uses bloom filters to
decide if a row can be deleted without major compaction. However since LCS
does not major compact without bloom filters you can end up in cases where
http://www.slideshare.net/edwardcapriolo/cassandra-as-memcache
Read at ONE.
READ_REPAIR_CHANCE as low as possible.
Use short TTL and short GC_GRACE.
Make the in memory memtable size as high as possible to avoid flushing and
compacting.
Optionally turn off commit log.
You can use cassandra
, are
you guys still using C* as in-memory store?
On Mar 6, 2013, at 7:44 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
http://www.slideshare.net/edwardcapriolo/cassandra-as-memcache
Read at ONE.
READ_REPAIR_CHANCE as low as possible.
Use short TTL and short GC_GRACE.
Make the in memory
There is no exact spec on timestamp the convention is micros from epoch but
you are free to use anything you want. To update a column you only need a
timestamp higher then the original.
On Tue, Mar 5, 2013 at 1:55 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
Yes, clients can write timestamps in
Your other option is to create tables 'WITH COMPACT STORAGE'. Basically if
you use COMPACT STORAGE and create tables as you did before.
https://issues.apache.org/jira/browse/CASSANDRA-2995
From an application standpoint, if you can't do sparse, wide rows, you
break compatibility with 90% of
Casandra's data files are write once. Deletes are another write. Until
compaction they all live on disk.Making really big rows has these problem.
On Sat, Mar 2, 2013 at 1:42 PM, Michael Kjellman mkjell...@barracuda.comwrote:
What is your gc_grace set to? Sounds like as the number of tombstones
Don't delete them either!
On Friday, March 1, 2013, Alain RODRIGUEZ arodr...@gmail.com wrote:
DO *NOT* USE THAT!!!
Crystal clear ;-). Thanks for the warning.
Alain
2013/3/1 Sylvain Lebresne sylv...@datastax.com
On C* 1.2.1 I see that the following query works:
update counters set
Pseudo code :
GregorianCalendar gc = new GregorianCalendar();
DateFormat df = new SimpleDateFormat( MMddhhmm');
String reversekey = df.format(gc);
set mycolumnfamily['myrow']['mycolumn'] = 'myvalue';
set myreverseindex['$reversekey]['myrow'] = '';
Under rapid insertion this makes hot-spots.
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/whytf_would_i_need_with
On Tue, Feb 26, 2013 at 12:28 PM, Javier Sotelo
javier.a.sot...@gmail.com wrote:
Thanks Dean, very helpful info.
Javier
On Tue, Feb 26, 2013 at 7:33 AM, Hiller, Dean dean.hil...@nrel.gov wrote:
Oh, and 50
The theoretical maximum of 10G is not even close to what you actually get.
Write once and compact is generally a bad fit for very large datasets.
It is like being able to jump 60 feet in the air, but your legs can
not withstand 10 feet drops.
http://wiki.apache.org/cassandra/LargeDataSetConsiderations
On Wed, Feb 20, 2013 at 3:33 PM, Bryan Talbot
The 40 TB use case you heard about is probably one 40TB mysql machine
that someone migrated to mongo so it would be web scale Cassandra is
NOT good with drives that big, get a blade center or a high density
chassis.
On Mon, Feb 18, 2013 at 8:00 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
I
based on your mount/selinux settings sometimes the os is unwilling to
tolerate so files outside certain directories.
Edward
On Tue, Feb 19, 2013 at 10:13 AM, Tim Dunphy bluethu...@gmail.com wrote:
Hey Guys,
I just wanted to follow up on this thread on how I go JNA to work with the
cassandra
These issues are more cloud specific then they are cassandra specific.
Cloud executives tell me in white papers that cloud is awesome and you
can fire all your sysadmins and network people and save money.
This is what happens when you believe cloud executives and their white
papers, you spend 10+
This is a bad example to follow. This is the internal client the
Cassandra nodes use to talk to each other (fat client) usually you do
not use this unless you want to write some embedded code on the
Cassandra server.
Typically clients use thrift/native transport. But you are likely
getting the
Here is the deal.
http://wiki.apache.org/hadoop/Defining%20Hadoop
INAPPROPRIATE: Automotive Joe's Crankshaft: 100% compatible with Hadoop
Bad, because 100% compatible is a meaningless statement. Even Apache
releases have regressions; cases were versions are incompatible *even
when the Java
. Just saying I do not know what .0 and .1 releases are.
They just seem like extended beta-s to me.
Edward
On Fri, Feb 15, 2013 at 11:10 PM, Eric Evans eev...@acunu.com wrote:
On Fri, Feb 15, 2013 at 7:01 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:
Seems like the hadoop Input format
Asking the question three times will not help getting it answered
faster. Furthermore, I believe no one has answered it because no one
understands what your asking.
Here is something with tests and conclusions and it is not written by
datastax or part of a book on cassandra.
amount of data?
(Default split size is 64k rows.)
On Fri, Feb 15, 2013 at 7:01 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:
Seems like the hadoop Input format should combine the splits that are
on the same node into the same map task, like Hadoop's
CombinedInputFormat can. I am not sure
It is not going to be true for long that LCS does not require bloom filters.
https://issues.apache.org/jira/browse/CASSANDRA-5029
Apparently, without bloom filters there are issues.
On Fri, Feb 15, 2013 at 7:29 AM, Blake Manders bl...@crosspixel.net wrote:
You probably want to look at your
With hyper threading a core can show up as two or maybe even four
physical system processors, this is something the kernel does.
On Fri, Feb 15, 2013 at 11:41 AM, Hiller, Dean dean.hil...@nrel.gov wrote:
We ran into an issue today where website became around 10 times slower. We
found out node
Seems like the hadoop Input format should combine the splits that are
on the same node into the same map task, like Hadoop's
CombinedInputFormat can. I am not sure who recommends vnodes as the
default, because this is now the second problem (that I know of) of
this class where vnodes has extra
The equivalent of multget slice is
select * from table where primary_key in ('that', 'this', 'the other thing')
Not sure if you can count these in a way that makes sense since you
can not group.
On Thu, Feb 14, 2013 at 9:17 PM, Michael Kjellman
mkjell...@barracuda.com wrote:
I'm confused what
Just an FYI. More appropriate for the client-dev list.
On Wed, Feb 13, 2013 at 10:37 AM, Gabriel Ciuloaica
gciuloa...@gmail.com wrote:
Code has good documentation and also the example module has enough sample
code to help you started.
--Gabi
On 2/13/13 5:31 PM, Shahryar Sedghi wrote:
for
DataStax's java-driver.
-- Drew
On Feb 13, 2013, at 8:06 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
Just an FYI. More appropriate for the client-dev list.
On Wed, Feb 13, 2013 at 10:37 AM, Gabriel Ciuloaica
gciuloa...@gmail.com wrote:
Code has good documentation and also the example
Your use case is 100% on the money for Cassandra. But let me take a
chance to slam the other NoSQLs. (not really slam but you know)
Riak is a key-value store. It is not a column family store where a
rowkey has a map of sorted values. This makes the time series more
awkward as the time series has
Yes. You need to run nodetool repair on each node. Repair calculates and
transmits the differences.
On Tuesday, February 12, 2013, S C as...@outlook.com wrote:
I have some data in my keyspaces. When I increase replication factor of a
existing keyspaces say from 2 to 3, will a nodetool repair
It can also happen if you have an older/non sun jvm.
On Tuesday, February 12, 2013, aaron morton aa...@thelastpickle.com wrote:
This looks like a bug in 1.2 beta
https://issues.apache.org/jira/browse/CASSANDRA-4553
Can you confirm you are running 1.2.1 and if you can re-create this with
a clean
You always want to run upgrade stables, ASAP. Sooner or later as the tables
compact they will upgrade. I hit a bug once with large bloom filters where,
after an upgrade, reading old bloom filters caused 0 rows returned when
there was data.
Once bitten twice shy. Upgrade one node disable gossip,
Are vnodes on by default. It seems that many on list are using this feature
with small clusters.
I know these days anything named virtual is sexy, but they are not useful
for small clusters are they. I do not see why people are using them.
On Monday, February 11, 2013, aaron morton
I take that back. vnodes are useful for any size cluster, but I do not see
them as a day one requirement. It seems like many people are stumbling over
this.
On Tuesday, February 12, 2013, Edward Capriolo edlinuxg...@gmail.com
wrote:
Are vnodes on by default. It seems that many on list are using
201 - 300 of 742 matches
Mail list logo