slow read performance with leveldb compactor

2011-10-04 Thread Radim Kolar
Lets say i have this: { generations : [ { generation : 0, members : [ 650, 651, 652, 653, 654 ] }, { generation : 1, members : [ 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648 ] }, { generation : 2, members : [ 566, 575, 576, 578, 579, 580, 582, 583, 584,

Better CF stats

2011-10-06 Thread Radim Kolar
I need more detailed CF stats. Currently CASS supports Read/write stats and cache hit ratio. I am interested in: 1. key not found: like get cf['non-existent-key'] 2. hits to tombstone, row existed but it is tombstoned now It this easy enough to implement?

Re: [VOTE] Release Apache Cassandra 1.0.3

2011-11-12 Thread Radim Kolar
-1 fix for (CASSANDRA-3466) is included (no exception this time) but hints are not delivered to other node: INFO [GossipTasks:1] 2011-11-12 15:05:35,001 Gossiper.java (line 759) InetAddress /**.99.40 is now dead. WARN [pool-1-thread-1] 2011-11-12 15:06:11,514 Memtable.java (line 169)

Re: [VOTE] Release Apache Cassandra 1.0.3

2011-11-14 Thread Radim Kolar
-1. fix for (CASSANDRA-3466) is included (no exception this time) but hints are not delivered to other node: anybody tested this? Did included fix worked for you?

Re: [VOTE] Release Apache Cassandra 1.0.3

2011-11-14 Thread Radim Kolar
I'm not sure why hints are not working for you. You might have hit some other issue. Some suggestions: 1. Verify that HintsColumnFamily actually contains some data with cassandra-cli and list HintsColumnFamily yes 2. Try restarting the node containing the hints to check if that gets your

hintedhandoff in 1.0.3

2011-11-15 Thread Radim Kolar
I suspect these partial/invalid hints are left over from a failed hints delivery from before you upgraded to 1.0.3 and not something created by 1.0.3. Try to clear HintsColumnFamily (by removing the sstables for example) first and then see if you still can reproduce this issue afterwards. it

Re: hintedhandoff in 1.0.3

2011-11-15 Thread Radim Kolar
Same problem on other node: 2 keys in HintsColumnFamily. One delivered, one left. INFO [HintedHandoff:1] 2011-11-15 10:31:53,181 HintedHandOffManager.java (line 268) Started hinted handoff for token: 99070591730234615865843651857942052864 INFO [HintedHandoff:1] 2011-11-15 10:32:49,385

Re: How is Cassandra being used?

2011-11-15 Thread Radim Kolar
ppl hate EHCache and Quartz for doing this.

Re: How is Cassandra being used?

2011-11-17 Thread Radim Kolar
Dne 16.11.2011 23:58, Bill napsal(a): We'll turn this off, and would possibly patch it out of the code. That's not to say it wouldn't be useful to others. we patch out of code spyware in ehcache and quartz too. This is only way to be sure that it will not be enabled by configuration mistake.

Re: hintedhandoff in 1.0.3

2011-11-17 Thread Radim Kolar
Dne 16.11.2011 18:17, Jonathan Ellis napsal(a): Keys in HCF are nodes it has hints for. Because it is 2 node cluster then it must write HH to himself and that explains why after second node gets back again. HH for it are delivered and cleaned but HH with second key are never delivered.

Re: 1.1 freeze approaching

2011-12-19 Thread Radim Kolar
Just a reminder that for us to meet our four-month major release schedule (i.e., 1.1 = Feb 18), can you make release cycle slower? its better to have more new features and do major upgrades less often. it saves time needed for testing and migrations.

major version release schedule

2011-12-20 Thread Radim Kolar
http://www.mail-archive.com/dev@cassandra.apache.org/msg01549.html I read it but things are different now because magic 1.0 is out. If you implement 1.0 and put it into production, you really do not want to retest app on new version every 4 months and its unlikely that you will get migration

Re: major version release schedule

2011-12-20 Thread Radim Kolar
Nobody's forcing you to upgrade. If you want twice as much time between upgrading, just wait for 1.2. Currently 1.0 branch is still less stable then 0.8, i still get OOM on some nodes. Adding 1.1 feature set on top will make it less stable. It's also worth noting that waiting for 2x as many

Re: RFC: Cassandra Virtual Nodes

2012-03-17 Thread Radim Kolar
I don't like that every node will have same portion of data. 1. We are using nodes with different HW sizes (number of disks) 2. especially with ordered partitioner there tends to be hotspots and you must assign smaller portion of data to nodes holding hotspots

Re: RFC: Cassandra Virtual Nodes

2012-03-19 Thread Radim Kolar
Hi Radim, The number of virtual nodes for each host would be configurable by the user, in much the same way that initial_token is configurable now. A host taking a larger number of virtual nodes (tokens) would have proportionately more of the data. This is how we anticipate support for

maven 3 build system

2012-04-27 Thread Radim Kolar
In general, maintaining the pom is something that can fall off the C* devs Maven is really easy tool once you get it going and gain necessary knowledge. It is really well integrated in Eclipse, in Jenkins and there are plugins for nearly anything and writing your plugins is easy and you can

Re: [VOTE] Release Mojo's Cassandra Maven Plugin 1.0.0-1

2012-05-03 Thread Radim Kolar
I'd like to release version 1.1.0-1 of Mojo's Cassandra Maven Plugin What is this plugin supposed to do?

make default download cassandra 1.0

2012-05-19 Thread Radim Kolar
because cassandra 1.0 is not sufficiently stable, what about to make cassandra 1.0 default download and add bottom line - cassandra 1.0 is also available. I seen this in other projects.

Re: make default download cassandra 1.0

2012-05-19 Thread Radim Kolar
message was wrong, It should be cass 1.1 vs 1.0. Cassandra 1.1 needs some time to stabilize. It took months to get cassandra 1.0 stable after it was released. Reworked schema changes in cass 1.1 produces some really weird bugs like disappearing entire keyspace (data are still there). I think

Re: Cassandra in memory key index

2012-06-08 Thread Radim Kolar
If you are interested I can help, I used the FST on a Hadoop project to implement a fast map side range join. create JIRA item with patch attached, i will test it.

Re: Cassandra in memory key index

2012-06-08 Thread Radim Kolar
Dne 8.6.2012 21:19, Jason Rutherglen napsal(a): Ok looks like the IndexSummary encapsulates everything, I can start with hacking that. do memory part first. i want to test it on existing serialized index data.

findbugs

2012-07-22 Thread Radim Kolar
I used findbugs on cassandra and it returns 69 possible errors. most problematic part of code is CQL - lot of null pointer problems there some interesting errors: C:/apache-nutch/eclipse/cassandra/src/java/org/apache/cassandra/service/AntiEntropyService.java:916 Condition.await() not in loop

Re: findbugs

2012-07-23 Thread Radim Kolar
The line numbers here don't appear to match with trunk. you are right, it was from old trunk 415 commits old. It was just demo of findbugs, for serious use developers should install findbugs maven plugin or eclipse plugin (preferred).

Re: findbugs

2012-07-23 Thread Radim Kolar
Dne 23.7.2012 16:34, Zoltan Farkas napsal(a): In general, I prefer integrating findbugs into the build process and fail the build if issues are found. I am a strong believer in this approach, increases the quality of the project significantly. Thats true, i am currently in process of fixing

Re: findbugs

2012-07-30 Thread Radim Kolar
i am using maven to build cassandra. i didnt have in mind to contribute build system because you are not interested in maven. In maven you just call findbugs plugin, nothing special to contribute. I had in mind patch fixing various FB discovered problems. but because its difficult to post it

Re: findbugs

2012-07-30 Thread Radim Kolar
Dne 30.7.2012 16:52, Jonathan Ellis napsal(a): Is Jenkins smart enough to be able to say, I know we had X findbugs warnings previously, which are known to be false positives, but now there are X+1 ? yes. Look at hadoop project pre-commit check builds.

customizable size tiered compaction

2012-09-22 Thread Radim Kolar
I am interested in experiments with size tiered compaction, because i get sstables which are never compacted because no other sstable is close to their size, i have plans to experiment with bucket ratio which is currently 50-150 percent to make it 33-200 percent. Its all about changing

maximum sstable size

2012-10-29 Thread Radim Kolar
its possible to implement maximum sstable size for tieredcompactionpolicy without much code changes? I am using it in lucene with really good performance effect, max size is 4 GB, dataset total size is 30 GB. It prevents lucene from creating too big segment which takes too long to be merged

Re: maximum sstable size

2012-11-03 Thread Radim Kolar
done https://issues.apache.org/jira/browse/CASSANDRA-4897

Re: maximum sstable size

2012-11-03 Thread Radim Kolar
Dne 4.11.2012 1:24, Edward Capriolo napsal(a): I have another ticket open for this. which one

Re: findbugs

2012-11-05 Thread Radim Kolar
Dne 30.7.2012 16:47, Edward Capriolo napsal(a): I am sure no one would have an issue with an optional findbugs target. https://issues.apache.org/jira/browse/CASSANDRA-4891 here you have optional findbugs target.

slf4j

2012-11-22 Thread Radim Kolar
instead of this: if (logger.isDebugEnabled()) logger.debug(INDEX LOAD TIME for + descriptor + : + (System.currentTimeMillis() - start) + ms.); do this: logger.debug(INDEX LOAD TIME for {} : {} ms., descriptor, (System.currentTimeMillis() - start)); easier

Re: Hector 0.8.0-2 update fails with : All host pools marked down. Retry burden pushed out to client.

2012-12-04 Thread Radim Kolar
Dne 3.12.2012 9:15, Bisht, Jaikrit napsal(a): Hi there, What have been the problems with Hector? problems with improper detection of down nodes problems with improper detection of timeouts some lost updates due to bad timestamp generation, spliting into more mutators helped. lack of support

real leveldb vs cassandra leveldb

2013-02-13 Thread Radim Kolar
real leveldb is better in lot of areas: L0 are 1/10 of L1 sstable size tables can be promoted to upper levels if no merging is needed (there is hole) variable number of sstables per level, but it tries to keep 1:10:100 sstable ratios. Not hard requirement very important - better hash function.

Re: real leveldb vs cassandra leveldb

2013-02-22 Thread Radim Kolar
Dne 13.2.2013 16:32, Jonathan Ellis napsal(a): The only point here that would make a difference in practice is leveldb using a worse hash function. how do you know that it would not make difference in practice. i have implemented some optimalization from leveldb to cassandra - different L0

Re: Time to roll 1.1.12?

2013-05-21 Thread Radim Kolar
* fsync leveled manifest to avoid corruption (CASSANDRA-5535) you sure that this does not have performance impact? most filesystems sync all their data not just one file. write to .new file and then do rename.

cassie

2013-05-21 Thread Radim Kolar
http://wiki.apache.org/cassandra/ClientOptionsThrift add cassie - https://github.com/twitter/cassie

EOL info

2013-05-23 Thread Radim Kolar
Dne 22.5.2013 18:22, Brandon Williams napsal(a): I don't see a 1.1.13 ever happening do you have some page with EOL information something like http://www.freebsd.org/security/ lifetime is about 6 months per major release?

manifest fsync

2013-05-27 Thread Radim Kolar
should be fine, the manifest is not rewritten that often its rewritten after each sstable flush? databases should do fsync only in checkpoint. Fsync scenario in WAFL is that hard checkpoint is done after predefined number of log segments. On checkpoint fsync everything and then write

cassandra vnodes

2013-07-07 Thread Radim Kolar
cassandra vnodes are just implementation of consistent hashing or there are some improvements to make similar sized split sizes? I decided to implement it in my cassandra too, but i am using zookeeper for cluster management.

wiki bootstrap doc

2013-07-12 Thread Radim Kolar
http://wiki.apache.org/cassandra/Operations* * To bootstrap a node, turn _AutoBootstrap_ on in the configuration file, and start it. * *its called *auto_bootstrap *in config file

Re: cassandra vnodes

2013-07-13 Thread Radim Kolar
I decided to implement it in my cassandra too, but i am using zookeeper for cluster management. I scraped idea of consistent hashing with random tokens. There is too much variance with effective ranges allocated to nodes in large cluster, you need to have lot of ranges which is quite large

Apache Cassandra 2.0.0 rc1

2013-08-10 Thread Radim Kolar
did you package it correctly? something seems to be missing ERROR 20:27:11,578 Internal error processing batch_mutate java.lang.NoClassDefFoundError: Could not initialize class org.apache.cassandra. triggers.TriggerExecutor at

Re: Apache Cassandra 2.0.0 rc1

2013-08-11 Thread Radim Kolar
Dne 10.8.2013 21:30, Brandon Williams napsal(a): Make a conf/triggers directory and that will fix it. We fixed this in trunk already. yes, that fixed it. 2.0 is considerably slower then 1.2 for cpu bound tasks, average throughput is -15% at 50 threads. 2.0 with 20 threads burst thruput with

Re: Apache Cassandra 2.0.0 rc1

2013-08-12 Thread Radim Kolar
thrift, sync

Re: [VOTE] Release Apache Cassandra 2.0.0-rc2

2013-08-20 Thread Radim Kolar
what about failing shuffle? CASSANDRA-5876 https://issues.apache.org/jira/browse/CASSANDRA-5876 https://issues.apache.org/jira/browse/CASSANDRA-5873