Re: Regarding JIRA
JIra should be left for issues that you have some confidence are bugs in cassandra or items you want as feature requests. For general questions, try the cassandra mailing lists user@cassandra.apache.org to subscribe - user-subscr...@cassandra.apache.org or use irc #cassandra on freenode On 2015-06-01 15:31, Kiran mk wrote: Hi , I am using Apache Cassandra Community Edition for my learning and practicing, can I raise the doubts,issues and clarifications using JIRA ticket against Cassandra. Will there be any charges for that. As I know we can create free JIRA account, Can anyone suggest me on this. -- Best Regards, Kiran.M.K.
Re: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper
This is what i meant by 'initial cause' Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.mapper.ColumnMapper So it is in fact a classpath problem Here is the class in question https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/mapper/ColumnMapper.scala Maybe it would be worthwhile to put this at the top of your main method System.out.println(System.getProperty(java.class.path); and show what that prints. What version of the cassandra and what version of the cassandra-spark connector are you using, btw? On 04/02/2015 11:16 PM, Tiwari, Tarun wrote: Sorry I was unable to reply for couple of days. I checked the error again and can’t see any other initial cause. Here is the full error that is coming. Exception in thread main java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) *Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.mapper.ColumnMapper* at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) *From:*Dave Brosius [mailto:dbros...@mebigfatguy.com] *Sent:* Tuesday, March 31, 2015 8:46 PM *To:* user@cassandra.apache.org *Subject:* Re: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper Is there an 'initial cause' listed under that exception you gave? As NoClassDefFoundError is not exactly the same as ClassNotFoundException. It meant that ColumnMapper couldn't initialize it's static initializer, it could be because some other class couldn't be found, or it could be some other non classloader related error. On 2015-03-31 10:42, Tiwari, Tarun wrote: Hi Experts, I am getting java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper while running a app to load data to Cassandra table using the datastax spark connector Is there something else I need to import in the program or dependencies? *RUNTIME ERROR:* Exception in thread main java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) *Below is my scala program* /*** ld_Cassandra_Table.scala ***/ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import com.datastax.spark.connector import com.datastax.spark.connector._ object ldCassandraTable { def main(args: Array[String]) { val fileName = args(0) val tblName = args(1) val conf = new SparkConf(true).set(spark.cassandra.connection.host, MASTER HOST) .setMaster(MASTER URL) .setAppName(LoadCassandraTableApp) val sc = new SparkContext(conf) sc.addJar(/home/analytics/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar) val normalfill = sc.textFile(fileName).map(line = line.split('|')) normalfill.map(line = (line(0), line(1), line(2), line(3), line(4), line(5), line(6), line(7), line(8), line(9), line(10), line(11), line(12), line(13), line(14), line(15), line(16), line(17), line(18), line(19), line(20), line(21))).saveToCassandra(keyspace, tblName, SomeColumns(wfctotalid, timesheetitemid, employeeid, durationsecsqty, wageamt, moneyamt, applydtm, laboracctid, paycodeid, startdtm, stimezoneid, adjstartdtm, adjapplydtm, enddtm, homeaccountsw, notpaidsw, wfcjoborgid, unapprovedsw, durationdaysqty, updatedtm, totaledversion, acctapprovalnum)) println(Records Loaded to .format(tblName)) Thread.sleep(500) sc.stop() } } *Below is the sbt file:* name:= “POC” version := 0.0.1 scalaVersion := 2.10.4 // additional libraries libraryDependencies ++= Seq( org.apache.spark %% spark-core % 1.1.1 % provided, org.apache.spark %% spark-sql % 1.1.1 % provided, com.datastax.spark %% spark-cassandra-connector % 1.1.1
Re: Getting NoClassDefFoundError for com/datastax/spark/connector/mapper/ColumnMapper
Is there an 'initial cause' listed under that exception you gave? As NoClassDefFoundError is not exactly the same as ClassNotFoundException. It meant that ColumnMapper couldn't initialize it's static initializer, it could be because some other class couldn't be found, or it could be some other non classloader related error. On 2015-03-31 10:42, Tiwari, Tarun wrote: Hi Experts, I am getting java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper while running a app to load data to Cassandra table using the datastax spark connector Is there something else I need to import in the program or dependencies? RUNTIME ERROR: Exception in thread main java.lang.NoClassDefFoundError: com/datastax/spark/connector/mapper/ColumnMapper at ldCassandraTable.main(ld_Cassandra_tbl_Job.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) BELOW IS MY SCALA PROGRAM /*** ld_Cassandra_Table.scala ***/ import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import com.datastax.spark.connector import com.datastax.spark.connector._ object ldCassandraTable { def main(args: Array[String]) { val fileName = args(0) val tblName = args(1) val conf = new SparkConf(true).set(spark.cassandra.connection.host, MASTER HOST) .setMaster(MASTER URL) .setAppName(LoadCassandraTableApp) val sc = new SparkContext(conf) sc.addJar(/home/analytics/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar) val normalfill = sc.textFile(fileName).map(line = line.split('|')) normalfill.map(line = (line(0), line(1), line(2), line(3), line(4), line(5), line(6), line(7), line(8), line(9), line(10), line(11), line(12), line(13), line(14), line(15), line(16), line(17), line(18), line(19), line(20), line(21))).saveToCassandra(keyspace, tblName, SomeColumns(wfctotalid, timesheetitemid, employeeid, durationsecsqty, wageamt, moneyamt, applydtm, laboracctid, paycodeid, startdtm, stimezoneid, adjstartdtm, adjapplydtm, enddtm, homeaccountsw, notpaidsw, wfcjoborgid, unapprovedsw, durationdaysqty, updatedtm, totaledversion, acctapprovalnum)) println(Records Loaded to .format(tblName)) Thread.sleep(500) sc.stop() } } BELOW IS THE SBT FILE: name:= POC version := 0.0.1 scalaVersion := 2.10.4 // additional libraries libraryDependencies ++= Seq( org.apache.spark %% spark-core % 1.1.1 % provided, org.apache.spark %% spark-sql % 1.1.1 % provided, com.datastax.spark %% spark-cassandra-connector % 1.1.1 % provided ) Regards, TARUN TIWARI | Workforce Analytics-ETL | KRONOS INDIA M: +91 9540 28 27 77 | Tel: +91 120 4015200 Kronos | Time Attendance * Scheduling * Absence Management * HR Payroll * Hiring * Labor Analytics JOIN KRONOS ON: KRONOS.COM [1] | FACEBOOK [2]|TWITTER [3]|LINKEDIN [4] |YOUTUBE [5] Links: -- [1] http://www.kronos.com/ [2] http://www.kronos.com/facebook [3] http://www.kronos.com/twitter [4] http://www.kronos.com/linkedin [5] http://www.kronos.com/youtube
Re: Storing bi-temporal data in Cassandra
As you point out, there's not really a node-based problem with your query from a performance point of view. This is a limitation of CQL in that, cql wants to slice one section of a partition's row (no matter how big the section is). In your case, you are asking to slice multiple sections of a partition's row, which currently isn't supported. It seems silly perhaps that this is the case, as certainly in your example it would seem useful, and not to difficult, but the problem is that you can wind up with n-depth slicing of that partitioned row given an arbitrary query syntax if range queries on clustering keys was allowed anywhere. At present, you can either duplicate the data using the other clustering key (transaction_time) as primary clusterer for this use case, or omit the 3rd criterion (transaction_time = '')in the query and get all the range query results and filter on the client. hth, dave On 02/14/2015 06:05 PM, Raj N wrote: I don't think thats solves my problem. The question really is why can't we use ranges for both time columns when they are part of the primary key. They are on 1 row after all. Is this just a CQL limitation? -Raj On Sat, Feb 14, 2015 at 3:35 AM, DuyHai Doan doanduy...@gmail.com mailto:doanduy...@gmail.com wrote: I am trying to get the state as of a particular transaction_time -- In that case you should probably define your primary key in another order for clustering columns PRIMARY KEY (weatherstation_id,transaction_time,event_time) Then, select * from temperatures where weatherstation_id = 'foo' and event_time = '2015-01-01 00:00:00' and event_time '2015-01-02 00:00:00' and transaction_time = '' On Sat, Feb 14, 2015 at 3:06 AM, Raj N raj.cassan...@gmail.com mailto:raj.cassan...@gmail.com wrote: Has anyone designed a bi-temporal table in Cassandra? Doesn't look like I can do this using CQL for now. Taking the time series example from well known modeling tutorials in Cassandra - CREATE TABLE temperatures ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time), ) WITH CLUSTERING ORDER BY (event_time DESC); If I add another column transaction_time CREATE TABLE temperatures ( weatherstation_id text, event_time timestamp, transaction_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time,transaction_time), ) WITH CLUSTERING ORDER BY (event_time DESC, transaction_time DESC); If I try to run a query using the following CQL, it throws an error - select * from temperatures where weatherstation_id = 'foo' and event_time = '2015-01-01 00:00:00' and event_time '2015-01-02 00:00:00' and transaction_time '2015-01-02 00:00:00' It works if I use an equals clause for the event_time. I am trying to get the state as of a particular transaction_time -Raj
Re: Cassandra 2.1.2, Pig 0.14, Hadoop 2.6.0 does not work together
The method com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set; should be available in guava from 15.0 on. So guava-16.0 should be fine. It's possible guava is being picked up from somewhere else? have a global classpath variable? you might want to do URL u = YourClass.getResource(/com/google/common/collect/Sets.class); System.out.println(u); to see where you are loading guava from. On 01/22/2015 04:12 AM, Pinak Pani wrote: I am using Pig with Cassandra (Cassandra 2.1.2, Pig 0.14, Hadoop 2.6.0 combo). When I use CqlStorage() I get org.apache.pig.backend.executionengine.ExecException: ERROR 2118: org.apache.cassandra.exceptions.ConfigurationException: Unable to find inputformat class 'org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat/ When I use CqlNativeStorage() I get java.lang.NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set; Pig classpath looks like this: » echo $PIG_CLASSPATH /home/naishe/apps/apache-cassandra-2.1.2/lib/airline-0.6.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/antlr-runtime-3.5.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/apache-cassandra-2.1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/apache-cassandra-clientutil-2.1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/apache-cassandra-thrift-2.1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/commons-cli-1.1.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/commons-codec-1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/commons-lang3-3.1.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/commons-math3-3.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/compress-lzf-0.8.4.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/concurrentlinkedhashmap-lru-1.4.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/disruptor-3.0.1.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/*guava-16.0.jar*:/home/naishe/apps/apache-cassandra-2.1.2/lib/high-scale-lib-1.0.6.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jackson-core-asl-1.9.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jackson-mapper-asl-1.9.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jamm-0.2.8.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/javax.inject.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jbcrypt-0.3m.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jline-1.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/jna-4.0.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/json-simple-1.1.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/libthrift-0.9.1.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/logback-classic-1.1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/logback-core-1.1.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/lz4-1.2.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/metrics-core-2.2.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/netty-all-4.0.23.Final.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/reporter-config-2.1.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/slf4j-api-1.7.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/snakeyaml-1.11.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/snappy-java-1.0.5.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/stream-2.5.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/stringtemplate-4.0.2.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/super-csv-2.1.0.jar:/home/naishe/apps/apache-cassandra-2.1.2/lib/thrift-server-0.3.7.jar::/home/naishe/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.2/cassandra-driver-core-2.1.2.jar:/home/naishe/.m2/repository/org/apache/cassandra/cassandra-all/2.1.2/cassandra-all-2.1.2.jar I have read somewhere that it is due to version conflict with Guava library. So, I tried using Guava 11.0.2, that did not help. (http://stackoverflow.com/questions/27089126/nosuchmethoderror-sets-newconcurrenthashset-while-running-jar-using-hadoop#comment42687234_27089126) Here is the Pig latin that I was trying to execute. grunt alice = LOAD 'cql://hadoop_test/lines' USING CqlNativeStorage(); 2015-01-22 09:28:54,133 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name http://fs.default.name is deprecated. Instead, use fs.defaultFS grunt B = foreach alice generate flatten(TOKENIZE((chararray)$0)) as word; grunt C = group B by word; grunt D = foreach C generate COUNT(B) as word_count, group as word; grunt dump D; 2015-01-22 09:29:06,808 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY [ -- snip -- ] 2015-01-22 09:29:11,254 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2015-01-22 09:29:11,588 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output 2015-01-22 09:29:11,600 [Thread-22] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete. 2015-01-22 09:29:11,620 [Thread-22] WARN
Re: Cassandra Wiki Immutable?
added, thanks. On 08/18/2014 06:15 AM, Otis Gospodnetic wrote: Hi, What is the state of Cassandra Wiki -- http://wiki.apache.org/cassandra ? I tried to update a few pages, but it looks like pages are immutable. Do I need to have my Wiki username (OtisGospodnetic) added to some ACL? Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/
Re: Why is the cassandra documentation such poor quality?
We had a massive spam problem before we locked down the wiki, so unfortunately that was the choice we had to make. But as stated we can add you to the contributers list. What is your Wiki user name? On 2014-07-23 07:33, Peter Lin wrote: I've tried to contribute docs to Cassandra wiki in the past, but there's an obstacle. currently wiki.apache.org/cassandra [1] is locked down, so only commiters can edit it. I really wish that wasn't the case, since it wastes time. the commiters are busy writing code. Having to email a commiter and ask them to update it feels silly to me and kind of goes against openness. Back when I was active with JMeter, we decided to leave it open so that anyone can edit the docs. I can't be the only one that wants to help make the docs better, but get frustrated with the wiki being closed. On Wed, Jul 23, 2014 at 4:25 AM, spa...@gmail.com wrote: I would like to help out with the documentation of C*. How do I start? On Wed, Jul 23, 2014 at 12:46 PM, Robert Stupp sn...@snazy.de wrote: Just a note: If you have suggestions how to improve documentation on the datastax website, write them an email to d...@datastax.com. They appreciate proposals :) Am 23.07.2014 um 09:10 schrieb Mark Reddy mark.re...@boxever.com: Hi Kevin, The difference here is that the Apache Cassandra site is maintained by the community whereas the DataStax site is maintained by paid employees with a vested interest in producing documentation. With DataStax having some comprehensive docs, I guess the desire for people to maintain the Apache site has dwindled. However, if you are interested in contributing to it and bringing it back up to standard you can, thus is the freedom of open source. Mark On Wed, Jul 23, 2014 at 2:54 AM, Kevin Burton bur...@spinn3r.com wrote: This document: https://wiki.apache.org/cassandra/Operations [2] … for example. Is extremely out dated… does NOT reflect 2.x releases certainly. Mentions commands that are long since removed/deprecated. Instead of giving bad documentation, maybe remove this and mark it as obsolete. The datastax documentation… is … acceptable I guess. My main criticism there is that a lot of it it is in their blog. Kevin -- Founder/CEO Spinn3r.com [3] Location: SAN FRANCISCO, CA blog: http://burtonator.wordpress.com [4] … or check out my Google+ profile [5] [3] -- http://spawgi.wordpress.com [6] We can do it and do it better. Links: -- [1] http://wiki.apache.org/cassandra [2] https://wiki.apache.org/cassandra/Operations [3] http://spinn3r.com/ [4] http://burtonator.wordpress.com/ [5] https://plus.google.com/102718274791889610666/posts [6] http://spawgi.wordpress.com
Re: What % of cassandra developers are employed by Datastax?
The question assumes that it's likely that datastax employees become committers. Actually, it's more likely that committers become datastax employees. So this underlying tone that datastax only really 'wants' datastax employees to be cassandra committers, is really misleading. Why wouldn't a company want to hire people who have shown a desire and aptitude to work on products that they care about? It's just rational. And damn genius, actually. I'm sure they'd be happy to have an influx of non-datastax committers. patches welcome. dave On 05/17/2014 08:28 AM, Peter Lin wrote: if you look at the new committers since 2012 they are mostly datastax On Fri, May 16, 2014 at 9:14 PM, Kevin Burton bur...@spinn3r.com mailto:bur...@spinn3r.com wrote: so 30%… according to that data. On Thu, May 15, 2014 at 4:59 PM, Michael Shuler mich...@pbandjelly.org mailto:mich...@pbandjelly.org wrote: On 05/14/2014 03:39 PM, Kevin Burton wrote: I'm curious what % of cassandra developers are employed by Datastax? http://wiki.apache.org/cassandra/Committers -- Kind regards, Michael -- Founder/CEO Spinn3r.com http://Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog:**http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: initial token crashes cassandra
What Colin is saying is that the tool you used to create the token, is not creating tokens usable for the Murmur3Partitioner. That tool is probably generating tokens for the (original) RandomPartitioner, which has a different range. On 05/17/2014 07:20 PM, Tim Dunphy wrote: Hi and thanks for your response. The puzzling thing is that yes I am using the murmur partition, yet I am still getting the error I just told you guys about: [root@beta:/etc/alternatives/cassandrahome] #grep -i partition conf/cassandra.yaml | grep -v '#' partitioner: org.apache.cassandra.dht.Murmur3Partitioner Thanks Tim On Sat, May 17, 2014 at 3:23 PM, Colin colpcl...@gmail.com mailto:colpcl...@gmail.com wrote: You may have used the old random partitioner token generator. Use the murmur partitioner token generator instead. -- Colin 320-221-9531 tel:320-221-9531 On May 17, 2014, at 1:15 PM, Tim Dunphy bluethu...@gmail.com mailto:bluethu...@gmail.com wrote: Hey all, I've set my initial_token in cassandra 2.0.7 using a python script I found at the datastax wiki. I've set the value like this: initial_token: 85070591730234615865843651857942052864 And cassandra crashes when I try to start it: [root@beta:/etc/alternatives/cassandrahome] #./bin/cassandra -f INFO 18:14:38,511 Logging initialized INFO 18:14:38,560 Loading settings from file:/usr/local/apache-cassandra-2.0.7/conf/cassandra.yaml INFO 18:14:39,151 Data files directories: [/var/lib/cassandra/data] INFO 18:14:39,152 Commit log directory: /var/lib/cassandra/commitlog INFO 18:14:39,153 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 18:14:39,153 disk_failure_policy is stop INFO 18:14:39,153 commit_failure_policy is stop INFO 18:14:39,161 Global memtable threshold is enabled at 251MB INFO 18:14:39,362 Not using multi-threaded compaction ERROR 18:14:39,365 Fatal configuration error org.apache.cassandra.exceptions.ConfigurationException: For input string: 85070591730234615865843651857942052864 at org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:178) at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:440) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:111) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:153) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:471) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:560) For input string: 85070591730234615865843651857942052864 Fatal configuration error; unable to start. See log for stacktrace. I really need to get replication going between 2 nodes. Can someone clue me into why this may be crashing? Thanks! Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net http://pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net http://pool.sks-keyservers.net --recv-keys F186197B
Re: Failed to mkdirs $HOME/.cassandra
For now you can edit the nodetool script itself by adding -Duser.home=/tmp as in $JAVA $JAVA_AGENT -cp $CLASSPATH -Xmx32m -Duser.home=/tmp -Dlogback.configurationFile=logback-tools.xml -Dstorage-config=$CASSANDRA_CONF org.apache.cassandra.tools.NodeTool -p $JMX_PORT $ARGS if you like you can add an issue to jira. On 2014-05-09 18:42, Bryan Talbot wrote: How should nodetool command be run as the user nobody? The nodetool command fails with an exception if it cannot create a .cassandra directory in the current user's home directory. I'd like to schedule some nodetool commands to run with least privilege as cron jobs. I'd like to run them as the nobody user -- which typically has / as the home directory -- since that's what the user is typically used for (minimum privileges). None of the methods described in this JIRA actually seem to work (with 2.0.7 anyway) https://issues.apache.org/jira/browse/CASSANDRA-6475 [1] Testing as a normal user with no write permissions to the home directory (to simulate the nobody user) [vagrant@local-dev ~]$ nodetool version ReleaseVersion: 2.0.7 [vagrant@local-dev ~]$ rm -rf .cassandra/ [vagrant@local-dev ~]$ chmod a-w . [vagrant@local-dev ~]$ nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ HOME=/tmp nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ env HOME=/tmp nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ env user.home=/tmp nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ nodetool -Duser.home=/tmp flush my_ks my_cf Unrecognized option: -Duser.home=/tmp usage: java org.apache.cassandra.tools.NodeCmd --host arg command ... Links: -- [1] https://issues.apache.org/jira/browse/CASSANDRA-6475
Re: java.lang.StackOverflowError with big IN list
In the mean time you can try upping the value of your -Xss setting in cassandra-env.sh to see if just a little push will take the problem away. On 01/10/2014 10:18 AM, Дмитрий Шохов wrote: https://issues.apache.org/jira/browse/CASSANDRA-6567 Thank you! 2014/1/10 Benedict Elliott Smith belliottsm...@datastax.com mailto:belliottsm...@datastax.com It must be a very large IN clause, which is probably not advisable. But it shouldn't cause this error, and since it's an easy fix to prevent it, if you file a JIRA I'll post a patch. On 10 January 2014 13:08, Дмитрий Шохов sho...@gmail.com mailto:sho...@gmail.com wrote: Hello I'm getting stack overflow when running prepared queries with IN parameter and binding big list in it. Is it known limitation and I must implement manual paging or change logic to get around this, or is it some bug maybe... java.lang.StackOverflowError at org.apache.cassandra.utils.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(FastByteComparisons.java:110) at org.apache.cassandra.utils.FastByteComparisons.compareTo(FastByteComparisons.java:41) at org.apache.cassandra.utils.FBUtilities.compareUnsigned(FBUtilities.java:216) at org.apache.cassandra.utils.ByteBufferUtil.compareUnsigned(ByteBufferUtil.java:89) at org.apache.cassandra.db.marshal.LongType.compareLongs(LongType.java:54) at org.apache.cassandra.db.marshal.LongType.compare(LongType.java:36) at org.apache.cassandra.db.marshal.LongType.compare(LongType.java:28) at org.apache.cassandra.db.ArrayBackedSortedColumns.binarySearch(ArrayBackedSortedColumns.java:170) at org.apache.cassandra.db.ArrayBackedSortedColumns.binarySearch(ArrayBackedSortedColumns.java:152) at org.apache.cassandra.db.ArrayBackedSortedColumns.getColumn(ArrayBackedSortedColumns.java:89) at org.apache.cassandra.cql3.statements.SelectStatement$1$1.computeNext(SelectStatement.java:825) at org.apache.cassandra.cql3.statements.SelectStatement$1$1.computeNext(SelectStatement.java:826) at org.apache.cassandra.cql3.statements.SelectStatement$1$1.computeNext(SelectStatement.java:826) at org.apache.cassandra.cql3.statements.SelectStatement$1$1.computeNext(SelectStatement.java:826) at many more same line stack elements Cassandra 2.0.4 Java driver 2.0 rc2
Re: unsubscribe
just send that email to user-unsubscribe@cassandra.apache.orgif still confused check here http://hadonejob.com/img/full/12598654.jpg - Original Message -From: quot;Earl Rubyquot; ;er...@webcdr.com
Re: Why was Thrift defined obsolete?
Realize that there will be more and more new features that come along as cassandra matures. It is an overwhelming certainty that these feature will be available thru the new native interface amp; CQL. The same level of certainty can't be given to Thrift. Certainly if you have existing applications running against Thrift, then there is no need to worry that Thrift will break or not perform optimally in the future. But going forward, there will be things that you won't be able to use thru Thrift that may solve problems for you. If you are starting now, the recommendation is to use the new native interface and CQL.Just saying... - Original Message -From: quot;Peter Linquot; ;wool...@gmail.com
Re: Parse xml and store data in Map using xom parser
Not really a cassandra question, but it would seem your xml file isn't particularly well designed. It would seem you need to qualify your test entries with indices when put in the map, such as put(test.1.C, 0); put(test.2.C, 50); before figuring out the cassandra angle, i'd rethink how that xml is designed if that's within your control. On 12/08/2013 10:10 AM, Santosh Shet wrote: Hi, I am trying to parse below shown XML file using *xom parser in java* and put each key,value pairs into Map. Later I am trying to insert this Map into Cassandra using Mutator object. My XML file looks like as below: sample Max0.25000/Max test APercentage/A B1/B C0/C D20/D E0.25/E /test test APercentage/A B1/B C50/C D75/D E0.15/E /test /sample Currently I have HashMapString,String xmlData to hold elements of XML. I am traversing each child elements using getChildElements() and then retrieving element name ,element value and storing them inside HashMap. But I am facing problem while doing for the second subchild element (*testmarked in green*) because it overwrites values of child elements of which I had traversed in my last iteration (*test marked in blue*) . Could somebody provide thoughts on how to store data in Cassandra in above situation. Is there any better way to it or do I need to append counter+xpath. For example, if there are 2 child elements, append test1+1+A for element A inside first test and append test+2+A for subchild of another test element. Thanks in advance. Best, *Santosh Shet* Software Engineer | VistaOne Solutions Direct India : *+91 80 30273829* | Mobile India : *+91 8105720582* Skype : santushet
Re: com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided
BoundStatement query = prBatchInsert.bind(userId, attributes.values().toArray(new *String*[attributes.size()])) On 12/07/2013 03:59 PM, Techy Teck wrote: I am trying to insert into Cassandra database using Datastax Java driver. But everytime I am getting below exception at `prBatchInsert.bind` line- com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 1 of CQL type text, expecting class java.lang.String but class [Ljava.lang.Object; provided Below is my method which accepts `userId` as the input and `attributes` as the `Map` which contains `key` as my `Column Name` and value as the actual value of that column public void upsertAttributes(final String userId, final MapString, String attributes, final String columnFamily) { try { SetString keys = attributes.keySet(); StringBuilder sqlPart1 = new StringBuilder(); //StringBuilder.append() is faster than concatenating Strings in a loop StringBuilder sqlPart2 = new StringBuilder(); sqlPart1.append(INSERT INTO + columnFamily + (USER_ID ); sqlPart2.append() VALUES ( ?); for (String k : keys) { sqlPart1.append(, +k); //append each key sqlPart2.append(, ?); //append an unknown value for each key } sqlPart2.append() ); //Last parenthesis (and space?) String sql = sqlPart1.toString()+sqlPart2.toString(); CassandraDatastaxConnection.getInstance(); PreparedStatement prBatchInsert = CassandraDatastaxConnection.getSession().prepare(sql); prBatchInsert.setConsistencyLevel(ConsistencyLevel.ONE); // this line is giving me an exception BoundStatement query = prBatchInsert.bind(userId, attributes.values().toArray(new Object[attributes.size()])); //Vararg methods can take an array (might need to cast it to String[]?). CassandraDatastaxConnection.getSession().executeAsync(query); } catch (InvalidQueryException e) { LOG.error(Invalid Query Exception in CassandraDatastaxClient::upsertAttributes +e); } catch (Exception e) { LOG.error(Exception in CassandraDatastaxClient::upsertAttributes +e); } } What wrong I am doing here? Any thoughts?
Re: unsubscribe
Please send that same riveting text to user-unsubscr...@cassandra.apache.org *http://tinyurl.com/kdrwyrc* On 10/30/2013 02:49 PM, Leonid Ilyevsky wrote: Unsubscribe This email, along with any attachments, is confidential and may be legally privileged or otherwise protected from disclosure. Any unauthorized dissemination, copying or use of the contents of this email is strictly prohibited and may be in violation of law. If you are not the intended recipient, any disclosure, copying, forwarding or distribution of this email is strictly prohibited and this email and any attachments should be deleted immediately. This email and any attachments do not constitute an offer to sell or a solicitation of an offer to purchase any interest in any investment vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital does not provide legal, accounting or tax advice. Any statement regarding legal, accounting or tax matters was not intended or written to be relied upon by any person as advice. Moon Capital does not waive confidentiality or privilege as a result of this email.
Re: Writing same key on two nodes using ONE consistency
each node would forward the write request to the node responsible to hold that key (determined by the hash function) On 10/26/2013 09:25 PM, Mohammad Hajjat wrote: Hi, Quick question about Cassandra. If I write the same key (with two different values) to two different nodes with consistency of ONE. Assuming 'SimpleStrategy' and no replication. Would each node receiving the request write that key in its local storage and return success (thus we end up with same key having two different values on the two nodes)? Or would each node forward the write request to the node responsible to hold that key (determined by the hash function)? Thanks! -- *Mohammad Hajjat* *Ph.D. Student* *Electrical and Computer Engineering* *Purdue University*
Re: Cassandra book/tuturial
Unfortunately, as tech books tend to be, it's quite a bit out of date, at this point. On 10/27/2013 09:54 PM, Mohan L wrote: On Sun, Oct 27, 2013 at 9:57 PM, Erwin Karbasi er...@optinity.com mailto:er...@optinity.com wrote: Hey Guys, What is the best book to learn Cassandra from scratch? Thanks in advance, Erwin Hi, Buy : Cassandra: The Definitive Guide By Eben Hewitt : http://shop.oreilly.com/product/0636920010852.do Thanks Mohan L
Re: Composite keys and composite columns
The explanation for Composite columns is muddied by verbage depending on whether you are talking about the thrift interface which tends to talk about things in low terms, or cql which tends to talk about things in higher level terms. At a thrift/low level, a composite column, really now called a composite cell, is just a cell that has a name which contains multiple parts packed into a ByteBuffer. These multiple parts are understood by cassandra for validation, sorting and slicing purposes. At a CQL level, there are really just compound keys, where the first part of a compound key is the partition key, it alone decides where the data lives (what node). The rest of the keys are clustering keys, and effect cql row sorting. In CQL, then, columns that exist after columns which are part of the clustering keys, are grouped by those clustering keys. Under the cover these extra columns have names that are prefixed by the multipart-clustering name. As for using column names as data, again it depends on the interface thrift/cql as to how to look at it. For instance with thrift you can slice columns that start from some value and end with some value, and find column names between. What shows up as columns probably means something to you. HTH, dave On 10/17/2013 07:51 PM, Hartzman, Leslie wrote: Hi, I'm looking for clarification on composite keys and composite columns. From what I've read with regards to composite keys, you have a collection of columns where of 'n' columns, the first n-1 form the composite primary key and the last column is the data for that composite key. Do I have this right? What I've just read about composite columns is that there are static and dynamic composite column names, but dynamic should be avoided. If the column names can be created programmatically, what does the schema definition look like for this, or is it omitted since they're programmatically created? I'm assuming that these are the dynamic composite columns. So how are the static composite columns defined in the schema? Also, if a column name is used as the value as well (composite or non-composite columns), how do you query that? If the value is empty and the column name IS the value, is the knowledge of what you're querying in the business logic due to the construct of that particular column family? Thanks. Les [CONFIDENTIALITY AND PRIVACY NOTICE] Information transmitted by this email is proprietary to Medtronic and is intended for use only by the individual or entity to which it is addressed, and may contain information that is private, privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please delete this mail from your records. To view this notice in other languages you can either select the following link or manually copy and paste the link into the address bar of a web browser: http://emaildisclaimer.medtronic.com
Re: Unsupported major.minor version 51.0
Cassandra-2.0 needs to run on jdk7 On 09/17/2013 11:21 PM, Gary Zhao wrote: Hello I just saw this error. Anyone knows how to fix it? [root@gary-vm1 apache-cassandra-2.0.0]# bin/cassandra -f xss = -ea -javaagent:bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4014M -Xmx4014M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k Exception in thread main java.lang.UnsupportedClassVersionError: org/apache/cassandra/service/CassandraDaemon : Unsupported major.minor version 51.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632) at java.lang.ClassLoader.defineClass(ClassLoader.java:616) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) Could not find the main class: org.apache.cassandra.service.CassandraDaemon. Program will exit. [root@gary-vm1 apache-cassandra-2.0.0]# java -version java version 1.6.0_24 Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) Thanks Gary
Re: Custom data type is not work at C* 2.0
I think your class is missing a required public TypeSerializerVoid getSerializer() {} method This is what you need to derive from https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob;f=src/java/org/apache/cassandra/db/marshal/AbstractType.java;h=74fe446319c199433b47d3ae60fc4d644e86b653;hb=03045ca22b11b0e5fc85c4fabd83ce6121b5709b On 09/04/2013 09:14 AM, Katsutoshi wrote: package my.marshal; import java.nio.ByteBuffer; import org.apache.cassandra.db.marshal.AbstractType; import org.apache.cassandra.db.marshal.MarshalException; import org.apache.cassandra.utils.ByteBufferUtil; public class DummyType extends AbstractTypeVoid { public static final DummyType instance = new DummyType(); private DummyType(){ } public Void compose(ByteBuffer bytes){ return null; } public ByteBuffer decompose(Void value){ return ByteBufferUtil.EMPTY_BYTE_BUFFER; } public int compare(ByteBuffer o1, ByteBuffer o2){ return 0; } public String getString(ByteBuffer bytes){ return ; } public ByteBuffer fromString(String source) throws MarshalException{ if(!source.isEmpty()) throw new MarshalException(String.format('%s' is not empty, source)); return ByteBufferUtil.EMPTY_BYTE_BUFFER; } public void validate(ByteBuffer bytes) throws MarshalException{ } }
Re: AbstractCassandraDaemon.java (line 134) Exception in thread
What is your -Xss set to. If it's below 256m, set it there, and see if you still have the issues. - Original Message -From: quot;Julio Quieratiquot; ;julio.quier...@gmail.com
Re: Custom 1.2 Authentication plugin will not work unless user is in system_auth.users column family
It seems to me that isExistingUser should be pushed down to the IAuthenticator implementation. Perhaps you should add a ticket to https://issues.apache.org/jira/browse/CASSANDRA On 06/17/2013 05:12 PM, Bao Le wrote: Hi, We have a custom authenticator that works well with Cassandra 1.1.5. When upgrading to C* 1.2.5, authentication failed. Turn out that in ClientState.login, we make a call to Auth.isExistingUser(user.getName()) if the AuthenticatedUser is not Anonymous user. This isExistingUser method does a query on system_auth.users and if it cannot find the name there, throw an exception. If our authentication model involves exchanging data on the fly and not relying on pre-created users, how do we bypass this check? Should we add a method on IAuthenticator to specify whether user look-up is needed or not? Bao
Re: Unsubscribe?
You sent an email to user-unsubscr...@cassandra.apache.org from the email addressed used, and it didn't unsubscribe you? Did you get the 'are you sure' email? Did you check your spam folder? see http://cassandra.apache.org/ http://hadonejob.com/img/70907344.jpg On 06/10/2013 10:46 AM, Fatih P. wrote: i tried the same and receiving mails. On Mon, Jun 10, 2013 at 5:34 PM, Luke Hospadaruk luke.hospada...@ithaka.org mailto:luke.hospada...@ithaka.org wrote: Hi, I hate to be a clod, but I'd really like to unsubscribe from this list. I've tried every permutation I can think of to do it the right way, and all of the styles in the help message. If there's a moderator reading this could you please take me off the list? Thanks, Luke
Re:
what version of netty is on your classpath? On 05/16/2013 07:33 PM, aaron morton wrote: Try the IRC room for the java driver or submit a ticket on the JIRA system, see the links here https://github.com/datastax/java-driver Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 15/05/2013, at 5:50 PM, bjbylh bjb...@me.com mailto:bjb...@me.com wrote: hello all: i use datastax java-driver to connect c* ,when the program calls cluster.shutdown(),it prints out:java.lang.NoSuchMethodError:org.jboss.netty.channelFactory.shutdown()V. but i do not kown why... c* is 1.2.4,java-driver is 1.0.0 thank you. Sent from Samsung Mobile
Re: Look table structuring advice
if you want to store all the roles in one row, you can do create table roles (synthetic_key int, name text, primary key(synthetic_key, name)) with compact storage when inserting roles, just use the same key insert into roles (synthetic_key, name) values (0, 'Programmer'); insert into roles (synthetic_key, name) values (0, 'Tester'); and use select * from roles where synthetic_key = 0; (or some arbitrary key value you decide to use) the that data is stored on one node (and its replicas) of course if the number of roles grows to be large, you lose most of the value in having a cluster. On 05/04/2013 12:09 PM, Jabbar Azam wrote: Hello, I want to create a simple table holding user roles e.g. create table roles ( name text, primary key(name) ); If I want to get a list of roles for some admin tool I can use the following CQL3 select * from roles; When a new name is added it will be stored on a different host and doing a select * is going to be inefficient because the table will be stored across the cluster and each node will respond. The number of roles may be less than or just greater than a dozen. I'm not sure if I'm storing the roles correctly. The other thing I'm thinking about is that when I've read the roles once then I can cache them. Thanks Jabbar Azam
Re: Look table structuring advice
I just used 'synthetic key' as it's a term used with standard rdbms to mean a key that means nothing in the model, and is often a sequence or such. There's nothing particular to cassandra specific to that term. Just thought it would be something familiar to someone who understood rdbms. On 05/04/2013 02:44 PM, Jabbar Azam wrote: I never thought about using a synthetic key, but in this instance with about a dozen rows it's probably ok. Thanks for your great idea. Where did you read about the synthetic key idea? I've not come across it before. Thanks Jabbar Azam On 4 May 2013 19:30, Dave Brosius dbros...@mebigfatguy.com mailto:dbros...@mebigfatguy.com wrote: if you want to store all the roles in one row, you can do create table roles (synthetic_key int, name text, primary key(synthetic_key, name)) with compact storage when inserting roles, just use the same key insert into roles (synthetic_key, name) values (0, 'Programmer'); insert into roles (synthetic_key, name) values (0, 'Tester'); and use select * from roles where synthetic_key = 0; (or some arbitrary key value you decide to use) the that data is stored on one node (and its replicas) of course if the number of roles grows to be large, you lose most of the value in having a cluster. On 05/04/2013 12:09 PM, Jabbar Azam wrote: Hello, I want to create a simple table holding user roles e.g. create table roles ( name text, primary key(name) ); If I want to get a list of roles for some admin tool I can use the following CQL3 select * from roles; When a new name is added it will be stored on a different host and doing a select * is going to be inefficient because the table will be stored across the cluster and each node will respond. The number of roles may be less than or just greater than a dozen. I'm not sure if I'm storing the roles correctly. The other thing I'm thinking about is that when I've read the roles once then I can cache them. Thanks Jabbar Azam
Re: Retrieve data from Cassandra database using Datastax java driver
getColumnDefinitions only returns meta data, to get the data, use the iterator to navigate the rows IteratorRow it = result.iterator(); while (it.hasNext()) { Row r = it.next(); //do stuff with row } On 04/21/2013 12:02 AM, Techy Teck wrote: I am working with Datastax java-driver. And I am trying to retrieve few columns from the database basis on the input that is being passed to the below method- public MapString, String getAttributes(final String userId, final CollectionString attributeNames) { String query=SELECT +attributeNames.toString().substring(1, attributeNames.toString().length()-1)+ from profile where id = '+userId+ ';; CassandraDatastaxConnection.getInstance(); ResultSet result = CassandraDatastaxConnection.getSession().execute(query); MapString, String attributes = new ConcurrentHashMapString, String(); for(Definition def : result.getColumnDefinitions()) { //not sure how to put the columnName and columnValue that came back from the database attributes.put(column name, column value); } return attributes; } Now I got the result back from the database in *result* * * Now how to put the colum name and column value that came back from the database in a map? I am not able to understand how to retrieve colum value for a particular column in datastax java driver? Any thoughts will be of great help.
Re: Quorum read after quorum write guarantee
is the read and write happening on the same thread? On 03/10/2013 12:00 PM, André Cruz wrote: Hello. In my application it sometimes happens that I execute a multiget (I use pycassa) to fetch data that I have just inserted. I use quorum writes and reads, and my RF is 3. I've noticed that sometimes (1 in 1000 perhaps) an insert followed (300ms after) by a multiget will not find the just inserted data. Is this normal? Or is something wrong? Can there be some delay to obtain the inserted data even with quorum? Best regards, André
Re: unsubscribe
On 02/17/2013 01:26 PM, puneet loya wrote: unsubscribe me please. Thank you if only directions were followed: http://hadonejob.com/images/full/102.jpg send to user-unsubscr...@cassandra.apache.org
Re: Cassandra 1.20 with Cloudera Hadoop (CDH4) Compatibility Issue
see https://issues.apache.org/jira/browse/CASSANDRA-5201 On 02/15/2013 10:05 PM, Yang Song wrote: Hi, Does anyone use CDH4's Hadoop with Cassandra to interact? The goal is simply read/write to Cassandra from Hadoop direclty using ColumnFamilyInput(Output)Format, but seems a bit compatibility issue. There are two java exceptions 1. java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected This shows when I run hadoop jar file to read directly from Cassandra. Seems that there is a change on Hadoop that JobContext was changed from class to interface. Has anyone have similar issue? Does it mean the Hadoop version in CDH4 is old? 2. Another error is java.lang.NoSuchMethodError: org.apache.cassandra.hadoop.ConfigHelper.setRpcPort(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;)V This shows when the jar file contains rpc port for remote Cassandra cluster. Does anyone have similiar experience? Any comments are welcome. thanks!
Re: Cassandra/cqlsh Error: TSocket read 0 bytes
An exception occurred on the server, check the logs for the details of what happened, and post back here. On 02/07/2013 11:04 PM, Adam Venturella wrote: Has anyone encountered this before? What did I most likely break or how do I fix it?
RE: cassandra cqlsh error
xss = -ea -javaagent:./../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1005M -Xmx1005M -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss180k That is not an error, that is just 'debugging' information output to the command line. - Original Message -From: quot;Kumar, Anjaniquot; ;anjani.ku...@infogroup.com
RE: cassandra cqlsh error
This part, ERROR 13:39:24,456 Cannot open /var/lib/cassandra/data/system/Schema/system-Schema-hd-5; partitioner org.apache.cassandra.dht.RandomPartitioner does not match system partitioner org.apache.cassandra.dht.Murmur3Partitioner. Note that the default partitioner starting with Cassandra 1.2 is Murmur3Partitioner, so you will need to edit that to match your old partitioner if upgrading.is a problem.In 1.2 the default partitioner was changed, so if you are using 1.2 against old files, you will need to edit the cassandra.yaml to have org.apache.cassandra.dht.RandomPartitioneras the specified partitioner. - Original Message -From: quot;Kumar, Anjaniquot; ;anjani.ku...@infogroup.com
Re: CQL : Request did not complete within rpc_timeout
If querying by a date inequality is an important access paradigm you probably want a column that represents some time bucket (a month?) And have that column be part of the cql primary key. Thus when a query is requested you can make c* happy by specifying a date bucket to pick the c* row and the date inequality to slice the cql rows- columns. Of course this adds work for the client when dates span multiple buckets, but an open ended date inequality is probably troublesome for massive datasets anyway. On 02/03/2013 03:42 PM, Paul van Hoven wrote: Thanks for the answer. Can anybody else answer my other two questions, because my problem is not solved yet? 2013/2/3 Edward Capriolo edlinuxg...@gmail.com: This was the issue that prompted the WITH FILTERING ALLOWED: https://issues.apache.org/jira/browse/CASSANDRA-4915 Cassandra's storage system can only optimize certain queries. On Sun, Feb 3, 2013 at 2:07 PM, Paul van Hoven paul.van.ho...@googlemail.com wrote: I'm not sure if I understood your answer. When you have GB or TB of data any query that adds WITH FILTERING will not work at scale. 1. You mean any query that requires with filtering is slow? Secondary indexes need at least one equality. If you want to do this at scale you might need a different design. 2. And what design would be recommendable then? 3. How should the query look like such that it would scale? 2013/2/3 Edward Capriolo edlinuxg...@gmail.com: Secondary indexes need at least one equality. If you want to do this at scale you might need a different design. Using WITH FILTERING and LIMIT 10 is simply grabbing the first few random rows that match your criteria. When you have GB or TB of data any query that adds WITH FILTERING will not work at scale. This is why it was added to the language CQL lets you do some queries that seem fast when your developing with 10 rows, without this clause you would not know if a query is fast because it hits a cassandra index, or it is just fast because the results were found in the first 10 rows. Edward On Sun, Feb 3, 2013 at 10:56 AM, Paul van Hoven paul.van.ho...@googlemail.com wrote: Okay, here is the schema (actually it is in german, but I translated the column names such that it is easier to read for an international audience): cqlsh:demodb describe table offerten_log_archiv; CREATE TABLE offerten_log_archiv ( offerte_id int PRIMARY KEY, aktionen int, angezeigt bigint, datum timestamp, gutschrift bigint, kampagne_id int, klicks int, klicks_ungueltig int, kosten bigint, statistik_id bigint, stunden int, werbeflaeche_id int, werbemittel_id int ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND compaction={'class': 'SizeTieredCompactionStrategy'}; CREATE INDEX datum_key ON offerten_log_archiv (datum); CREATE INDEX stunden_key ON offerten_log_archiv (stunden); cqlsh:demodb This is the query I'm trying to perform: cqlsh:demodb select * from ola where date '2013-01-01' and hour = 0 limit 10 allow filtering; Request did not complete within rpc_timeout. ola = offerten_log_archiv (table name) hour = stunde (column name) date = datum (column name) I hope this information makes my problem more clear. 2013/2/3 Edward Capriolo edlinuxg...@gmail.com: Without seeing your schema it is hard to say, but in some cases ALLOW FILTERING might be considered EXPECT THIS COULD BE SLOW. It could mean the query is not hitting and index and is going to page through large amounts of data. On Sun, Feb 3, 2013 at 9:42 AM, Paul van Hoven paul.van.ho...@googlemail.com wrote: After figuring out how to use the operator on an secondary index I noticed that in a column family of about 5.5 million datasets I get a rpc_timeout when trying to read data from this table. In the concrete situation I want to request data younger than January 1 2013. The number of rows that should be affected are about 1 million. When doing the request I get a timeout error: cqlsh:demodb select * from ola where date '2013-01-01' and hour = 0 limit 10 allow filtering; Request did not complete within rpc_timeout. Actually I find this very confusing since I would except an exceptional performance gain in comparison to a similar sql query. Therefore, I think the query I'm performing is not appropriate for cassandra, although I would do a query like that in this manner on a sql database. So my question now is: How should I perfrom this query on cassandra?
Re: error when creating column family using cql3 and persisting datausing thrift
The statements used to create and populate the data might be mildly useful for those trying to help - Original Message -From: quot;Kuldeep Mishraquot; ;kuld.cs.mis...@gmail.com
Re: Create Keyspace failing in 1.2rc2 with syntax error?
the format has changed, check the help in cqlsh CREATE KEYSPACE Test WITH replication = {'class':'SimpleStrategy', 'replication_factor':1}; On 12/29/2012 04:27 PM, Adam Venturella wrote: When I create a keyspace with a SimpleStrategy as outlined here: https://cassandra.apache.org/doc/cql3/CQL.html#createKeyspaceStmt CREATE KEYSPACE Test WITH strategy_class = SimpleStrategy AND strategy_options:replication_factor = 1; I receive the following error: Bad Request: line 3:20 mismatched input ':' expecting '=' I'm running the following cqlsh: Connected to Test Cluster at localhost:9160. [cqlsh 2.3.0 | Cassandra 1.2.0~rc2 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Re: Using Cassandra BulkOuputFormat With newer versions of Hadoop (.23+)
I swapped in hadoop-core-1.0.3.jar and rebuilt cassandra, without issues. What problems where you having? On 09/21/2012 07:40 PM, Juan Valencia wrote: I can't seem to get Bulk Loading to Work in newer versions of Hadoop. since they switched JobContext from a class to an interface You lose binary backward compatibility Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.cassandra.hadoop.BulkOutputFormat.checkOutputSpecs(BulkOutputFormat.java:42) I tried recompiling against the newer Hadoop, but things got messy fast. Has anyone done this?
Re: anyone know how to lookup non-continguous columns BUT for prefixes?
You'd need to make n queries, or do a superset query from min;-
Re: Why Cassandra secondary indexes are so slow on just 350k rows?
If i understand you correctly, you are only ever querying for the rows where is_exported = false, and turning them into trues. What this means is that eventually you will have 1 row in the secondary index table with 350K columns that you will never look at. It seems to me you that perhaps you should just hold your own manual index cf that points to non exported rows, and just delete those columns when they are exported. On 08/28/2012 05:23 PM, Edward Kibardin wrote: I have a column family with the secondary index. The secondary index is basically a binary field, but I'm using a string for it. The field called *is_exported* and can be *'true'* or *'false'*. After request all loaded rows are updated with *is_exported = 'false'*. I'm polling this column table each ten minutes and exporting new rows as they appear. But here the problem: I'm seeing that time for this query grows pretty linear with amount of data in column table, and currently it takes *from 12 to 20 seconds (!!!) to find 5000 rows*. From my understanding, indexed request should not depend on number of rows in CF but from number of rows per one index value (cardinality), as it's just another hidden CF like: true : rowKey1 rowKey2 rowKey3 ... false: rowKey1 rowKey2 rowKey3 ... I'm using Pycassa to query the data, here the code I'm using: column_family = pycassa.ColumnFamily(cassandra_pool, column_family_name, read_consistency_level=2) is_exported_expr = create_index_expression('is_exported', 'false') clause = create_index_clause([is_exported_expr], count = 5000) column_family.get_indexed_slices(clause) Am I doing something wrong, but I expect this operation to work MUCH faster. Any ideas or suggestions? Some config info: - Cassandra 1.1.0 - RandomPartitioner - I have 2 nodes and replication_factor = 2 (each server has a full data copy) - Using AWS EC2, large instances - Software raid0 on ephemeral drives Thanks in advance!
Re: Why so slow?
Are you using multiple client threads? You might want to try the stress tool in the distribution. On 08/19/2012 02:09 PM, Peter Morris wrote: Hi all I have a Windows 7 machine (64 bit) with DataStax community server installed. Running a benchmark app on the server gives me 7000 inserts per second. Running the same app on a networked client gives me only 5 inserts per second. The two computers are connected directly via a cross over cable, and the network properties tell me that it is a 1Gbps connection. Is the Windows community edition crippled for network use perhaps, or could the problem be something else? Pete Pinging 10.0.0.2 with 32 bytes of data: Reply from 10.0.0.2 http://10.0.0.2: bytes=32 time=1ms TTL=128 Reply from 10.0.0.2 http://10.0.0.2: bytes=32 time1ms TTL=128 Reply from 10.0.0.2 http://10.0.0.2: bytes=32 time1ms TTL=128 Reply from 10.0.0.2 http://10.0.0.2: bytes=32 time1ms TTL=128
Re: Loading data on-demand in Cassandra
When data is first written it does remain in memory until that memory is flushed. After the data is only on disk, it remains there until a read for that row-key/column is requested so in essense it's always load on demand. Currently there is no support for async notifications of changes. On 08/12/2012 03:24 PM, Oliver Plohmann wrote: Hello, I'm looking a bit into Cassandra to see whether it would be something to go with for my company. I searched through the Internet, looked through the FAQs, etc. but there are still some few open questions. Hope I don't bother anybody with the usual beginner questions ... Is there a way to do load-on-demand of data in Cassandra? For the time being, we cannot afford to built up a cluster that holds our 700 GB SQL-Database in RAM. So we need to be able to load data on-demand from our relational database. Can this be done in Cassandra? Then there also needs to be a way to unload data in order to reclaim RAM space. Would be nice if it were possible to register for an asynchronous notification in case some value was changed. Can this be done? Thanks for any answers. Regards, Oliver
Re: Secondary index impact on write performance
There is a second (system managed) column family for each secondary index, so any write to a field that is indexed causes two writes, one to the main column family, and another to the index column family, where in this index column family the key is the value of the secondary column, and the value is the key of the original row. On 08/04/2012 11:40 AM, David McNelis wrote: Morning, Was reading up on secondary indexes and on the Datastax post about them, it mentions the additional management overhead, and also that if you alter an existing column family, that data will be updated in the background. But how do secondary indexes affect write performance? If the answer is it doesn't, then how do brand new records get located by a subsequent indexed query? If someone has a link to a post with some of this info, that would be awesome. David
Re: increased RF and repair, not working?
Quorum is defined as (replication_factor / 2) + 1 therefore quorum when rf = 2 is 2! so in your case, both nodes must be up. Really, using Quorum only starts making sense as a 'quorum' when RF=3 On 07/26/2012 10:38 PM, Yan Chunlu wrote: I am using Cassandra 1.0.2, have a 3 nodes cluster. the consistency level of read write are both QUORUM. At first the RF=1, and I figured that one node down will cause the cluster unusable. so I changed RF to 2, and run nodetool repair on every node(actually I did it twice). After the operation I think my data should be in at least two nodes, and it would be okay if one of them is down. But when I tried to simulate the failure, by disablegossip of one node, and the cluster knows this node is down. then access data from the cluster, it returned MaximumRetryException(pycassa). as my experiences this is caused by UnavailableException, which is means the data it is requesting is on a node which is down. so I wonder my data might not be replicated right, what should I do? thanks for the help! here is the keyspace info: / / /Keyspace: comments:/ / Replication Strategy: org.apache.cassandra.locator.SimpleStrategy/ / Durable Writes: true/ /Options: [replication_factor:2]/ the scheme version is okay: /[default@unknown] describe cluster;/ /Cluster Information:/ / Snitch: org.apache.cassandra.locator.SimpleSnitch/ / Partitioner: org.apache.cassandra.dht.RandomPartitioner/ / Schema versions: / /f67d0d50-b923-11e1--4f7cf9240aef: [192.168.1.129, 192.168.1.40, 192.168.1.50]/ the loads are as below: /nodetool -h localhost ring/ /Address DC RackStatus State Load OwnsToken / / 113427455640312821154458202477256070484 / /192.168.1.50datacenter1 rack1 Up Normal 28.77 GB 33.33% 0 / /192.168.1.40datacenter1 rack1 Up Normal 26.67 GB 33.33% 56713727820156410577229101238628035242 / /192.168.1.129 datacenter1 rack1 Up Normal 33.25 GB 33.33% 113427455640312821154458202477256070484 /
Re: increased RF and repair, not working?
You have RF=2, CL= Quorum but 3 nodes. So each row is represented on 2 of the 3 nodes.If you take a node down, one of two things can happen when you attempt to read a row.The row lives on the two nodes that are still up. In this case you will successfully read the data.The row lives on one node that is up, and one node that is down. In this case the read will fail because you haven't fulfilled the quorum (2 nodes in agreement) requirement. - Original Message -From: quot;Riyad Kallaquot; ;rka...@gmail.com
Re: Batch update efficiency with composite key
Cassandra doesn't do reads before writes. It just places the updates in memtables. In effect updates are the same as inserts.Batches certainly help with network latency, and some minor amount of code repetitiion on the server side. - Original Message -From: quot;Leonid Ilyevskyquot; ;lilyev...@mooncapital.com
Re: SSTable format
On 07/13/2012 08:00 PM, Michael Theroux wrote: Hello, I've been trying to understand in greater detail how SStables are stored, and how information is transferred between Cassandra nodes, especially when a new node is joining a cluster. Specifically, Is information stored to SStables ordered by rowkeys? Some of the articles I've read suggests this is the case (although it's a little vague if they actually mean that the columns are stored in order, not the rowkeys). However, if data is stored in rowkey order, how is this achieved, as sstables are immutable? Thanks for any insights, -Mike It depends on what partitioner you use. You should be using the RandomPartitioner, and if so, the rows are sorted by the hash of the row key. there are partitioners that sort based on the raw key value but these partitioners shouldn't be used as they have problems due to uneven partitioning of data. As for how this is done, remember an sstable doesn't hold all the data for a column family. Not only does the data for a column family exist on multiple servers, there are usually multiple sstable files on disk that represent data from one column family on one machine. So at the time the sstable is written, the rows that are to be put in the sstable are sorted, and written in sorted order. In fact the same rowkey may be written in multiple sstables, one sstable having one set of columns for the key, the other sstable having other columns for the same key. On query for some row based on a key, cassandra is responsible for finding where the columns are found in which sstables (potentially several) and merging the results.
Re: SSTable format
While in memory cassandra calls it a MemTable, but yes sstables are write-once, and later combined with others into new ones thru compaction. On 07/13/2012 09:54 PM, Michael Theroux wrote: Thanks for the information, So is the SStable essentially kept in memory, then sorted and written to disk on flush? After that point, an SStable is not modified, but can be written to another SStable through compaction? -Mike On Jul 13, 2012, at 8:22 PM, Rob Coli wrote: On Fri, Jul 13, 2012 at 5:18 PM, Dave Brosiusdbros...@baybroadband.net wrote: It depends on what partitioner you use. You should be using the RandomPartitioner, and if so, the rows are sorted by the hash of the row key. there are partitioners that sort based on the raw key value but these partitioners shouldn't be used as they have problems due to uneven partitioning of data. The formal way this works in the code is that SSTables are ordered by decorated row key, where decoration is only a transformation when you are not using OrderedPartitioner. FWIW, in case you see that DecoratedKey syntax while reading code.. =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Composite column/key creation via Hector
BTW, an issue was just fixed with dynamic columns in hector, you might want to try trunk. https://github.com/hector-client/hector/commit/2910b484629add683f61f392553e824c291fb6eb On 07/12/2012 06:25 PM, aaron morton wrote: You may have better luck on the Hector Mailing list… https://groups.google.com/forum/?fromgroups#!forum/hector-users https://groups.google.com/forum/?fromgroups#%21forum/hector-users Here is something I found in the docs though http://hector-client.github.com/hector/build/html/content/composite_with_templates.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/07/2012, at 9:04 AM, Michael Cherkasov wrote: Hi all, What is the right way to create CF with dynamic composite column and composite key? Now I use code like this: private static final String DEFAULT_DYNAMIC_COMPOSITE_ALIAES = (a=AsciiType,b=BytesType,i=IntegerType,x=LexicalUUIDType,l=LongType,t=TimeUUIDType,s=UTF8Type,u=UUIDType,A=AsciiType(reversed=true),B=BytesType(reversed=true),I=IntegerType(reversed=true),X=LexicalUUIDType(reversed=true),L=LongType(reversed=true),T=TimeUUIDType(reversed=true),S=UTF8Type(reversed=true),U=UUIDType(reversed=true)); for composite columns: BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setComparatorType( ComparatorType.DYNAMICCOMPOSITETYPE ); columnFamilyDefinition.setComparatorTypeAlias( DEFAULT_DYNAMIC_COMPOSITE_ALIAES ); columnFamilyDefinition.setKeyspaceName( keyspaceName ); columnFamilyDefinition.setName( TestCase ); columnFamilyDefinition.setColumnType( ColumnType.STANDARD ); ColumnFamilyDefinition cfDefStandard = new ThriftCfDef( columnFamilyDefinition ); cfDefStandard.setKeyValidationClass( ComparatorType.UTF8TYPE.getClassName() ); cfDefStandard.setDefaultValidationClass( ComparatorType.UTF8TYPE.getClassName() ); for keys: columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setComparatorType( ComparatorType.UTF8TYPE ); columnFamilyDefinition.setKeyspaceName( keyspaceName ); columnFamilyDefinition.setName( Parameter ); columnFamilyDefinition.setColumnType( ColumnType.STANDARD ); cfDefStandard = new ThriftCfDef( columnFamilyDefinition ); cfDefStandard.setKeyValidationClass( ComparatorType.DYNAMICCOMPOSITETYPE.getClassName() + DEFAULT_DYNAMIC_COMPOSITE_ALIAES ); cfDefStandard.setDefaultValidationClass( ComparatorType.UTF8TYPE.getClassName() ); Does it correct code? Do I really need so terrible DEFAULT_DYNAMIC_COMPOSITE_ALIAES ?
Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster
If i read what you are saying, you are _not_ using composite keys? That's one thing that could do it, if the first part of the composite key had a very very low cardinality. On 06/24/2012 11:00 AM, Safdar Kureishy wrote: Hi, I've searched online but was unable to find any leads for the problem below. This mailing list seemed the most appropriate place. Apologies in advance if that isn't the case. I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup the nodes with tokens /evenly distributed across the token space/, for a 5-node cluster (as evidenced below under the effective-ownership column of the nodetool ring output). My data is a set of a few million crawled web pages, crawled using Nutch, and also indexed using the solrindex command available through Nutch. AFAIK, the key for each document generated from the crawled data is the URL. Based on the load values for the nodes below, despite adding about 3 million web pages to this index via the HTTP Rest API (e.g.: http://9.9.9.x:8983/solandra/index/update), some nodes are still empty. Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes (shown in *bold* below) of the index, while the remaining 3 nodes are consistently getting hammered by all the data. If the RandomPartioner (which is what I'm using for this cluster) is supposed to achieve an even distribution of keys across the token space, why is it that the data below is skewed in this fashion? Literally, no key was yet been hashed to the nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on this absurdity?. [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring Address DC RackStatus State Load Effective-Owership Token 136112946768375385385349842972707284580 9.9.9.0 datacenter1 rack1 Up Normal 7.57 GB 20.00% 0 9.9.9.1 datacenter1 rack1 Up Normal *21.44 KB* 20.00% 34028236692093846346337460743176821145 9.9.9.2 datacenter1 rack1 Up Normal 14.99 GB 20.00% 68056473384187692692674921486353642290 9.9.9.3 datacenter1 rack1 Up Normal *50.79 KB* 20.00% 102084710076281539039012382229530463435 9.9.9.4 datacenter1 rack1 Up Normal 15.22 GB 20.00% 136112946768375385385349842972707284580 Thanks in advance. Regards, Safdar
Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster
Well it sounds like this doesn't apply to you. if you had set up your column family in cql as PRIMARY KEY (domain_name, path) or something like that and where looking at lots and lots of url pages (domain_name + path), but from a very small number domain_names, then the partitioner being just the domain_name could account for an uneven distribution. But it sounds like your key is just a URL so that should (in theory) be fine. On 06/24/2012 01:53 PM, Safdar Kureishy wrote: Hi Dave, Would you mind elaborating a bit more on that, preferably with an example? AFAIK, Solandra uses the unique id of the Solr document as the input for calculating the md5 hash for shard/node assignment. In this case the ids are just millions of varied web URLs that do /not/ adhere to any regular expression. I'm not sure if that answers your question below? Thanks, Safdar On Sun, Jun 24, 2012 at 8:38 PM, Dave Brosius dbros...@mebigfatguy.com mailto:dbros...@mebigfatguy.com wrote: If i read what you are saying, you are _not_ using composite keys? That's one thing that could do it, if the first part of the composite key had a very very low cardinality. On 06/24/2012 11:00 AM, Safdar Kureishy wrote: Hi, I've searched online but was unable to find any leads for the problem below. This mailing list seemed the most appropriate place. Apologies in advance if that isn't the case. I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup the nodes with tokens /evenly distributed across the token space/, for a 5-node cluster (as evidenced below under the effective-ownership column of the nodetool ring output). My data is a set of a few million crawled web pages, crawled using Nutch, and also indexed using the solrindex command available through Nutch. AFAIK, the key for each document generated from the crawled data is the URL. Based on the load values for the nodes below, despite adding about 3 million web pages to this index via the HTTP Rest API (e.g.: http://9.9.9.x:8983/solandra/index/update), some nodes are still empty. Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes (shown in *bold* below) of the index, while the remaining 3 nodes are consistently getting hammered by all the data. If the RandomPartioner (which is what I'm using for this cluster) is supposed to achieve an even distribution of keys across the token space, why is it that the data below is skewed in this fashion? Literally, no key was yet been hashed to the nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on this absurdity?. [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring Address DC RackStatus State Load Effective-Owership Token 136112946768375385385349842972707284580 9.9.9.0 datacenter1 rack1 Up Normal 7.57 GB 20.00% 0 9.9.9.1 datacenter1 rack1 Up Normal *21.44 KB* 20.00% 34028236692093846346337460743176821145 9.9.9.2 datacenter1 rack1 Up Normal 14.99 GB 20.00% 68056473384187692692674921486353642290 9.9.9.3 datacenter1 rack1 Up Normal *50.79 KB* 20.00% 102084710076281539039012382229530463435 9.9.9.4 datacenter1 rack1 Up Normal 15.22 GB 20.00% 136112946768375385385349842972707284580 Thanks in advance. Regards, Safdar
Re: Find rows without a column
On 06/22/2012 03:57 AM, Jeff Williams wrote: Hi, It doesn't look like this is possible, but can I select all rows missing a certain column? The equivalent of select * where col is null in SQL. Regards, Jeff remember that there really is no such thing as a row, just arbitrary columns associated with a key. So no, can't find 'rows' where a column is missing.
Re: store large String as col value
Column values are limited at 2G.Why store them as Base64? that just adds overhead. Storing the raw bytes will save you a bunch. - Original Message -From: quot;Cyril Auburtinquot; ;cyril.aubur...@gmail.com
Re: Urgent - IllegalArgumentException during compaction and memtable flush
One of the column names on the row with key 353339332d3134363533393931 failed to validate with the validator for the column. If you really are after what column is problematic, and are able to build and run cassandra, you can add debugging info to Column.java protected void validateName(CFMetaData metadata) throws MarshalException { *try {* AbstractType? nameValidator = metadata.cfType == ColumnFamilyType.Super ? metadata.subcolumnComparator : metadata.comparator; nameValidator.validate(name()); *} catch (MarshalException me) { throw new MarshalException(Failed validating name: + ByteBufferUtil.bytesToHex(name()), me); }* } btw, the 92668395684826132216160944211592988451 is just the key's token. On 06/14/2012 01:56 PM, Piavlo wrote: I was able to figure out that 353339332d3134363533393931 is the row key while no idea what is 92668395684826132216160944211592988451 part? sstable2json also fails with validation error on this row key now since I have lost data for this row - how do I find out that was the root cause? thanks protected void validateName(CFMetaData metadata) throws MarshalException { AbstractType? nameValidator = metadata.cfType == ColumnFamilyType.Super ? metadata.subcolumnComparator : metadata.comparator; nameValidator.validate(name()); } On 06/14/2012 06:17 PM, Piavlo wrote: Ok i've run scrub on the 3 nodes and the problematic row Error validating row DecoratedKey(92668395684826132216160944211592988451, 353339332d3134363533393931) The full message is WARN [CompactionExecutor:2700] 2012-06-14 14:26:42,041 CompactionManager.java (line 582) Non-fatal error reading row (stacktrace follows) java.io.IOError: java.io.IOException: Error validating row DecoratedKey(92668395684826132216160944211592988451, 353339332d3134363533393931) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:114) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:97) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:137) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:143) at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:566) at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:473) at org.apache.cassandra.db.compaction.CompactionManager.access$200(CompactionManager.java:64) at org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:213) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:183) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Error validating row DecoratedKey(92668395684826132216160944211592988451, 353339332d3134363533393931) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:241) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:110) ... 13 more Caused by: org.apache.cassandra.db.marshal.MarshalException: Not enough bytes to read value of component 1 at org.apache.cassandra.db.marshal.AbstractCompositeType.validate(AbstractCompositeType.java:240) at org.apache.cassandra.db.Column.validateName(Column.java:273) at org.apache.cassandra.db.Column.validateFields(Column.java:278) at org.apache.cassandra.db.ColumnFamily.validateColumnFields(ColumnFamily.java:372) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:237) ... 14 more WARN [CompactionExecutor:2700] 2012-06-14 14:26:42,085 CompactionManager.java (line 624) Row at 4047368880 is unreadable; skipping to next This happened on several sstables on on each of the nodes - meaning it was mutated several times dsc2b: /var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-450-Data.db at 4244390041 /var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-452-Data.db at 9366462649 dsc2c: /var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-413-Data.db at 4047368880 /var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-481-Data.db at 3598063925 dsc1a: /var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-883-Data.db at 271195463 /var/lib/cassandra/data/PRODUCTION/UserCompletions-hc-733-Data.db at
Re: Supercolumn behavior on writes
You can create composite columns on the fly. On 06/13/2012 09:58 PM, Greg Fausak wrote: That's a good question. I just went to a class, Ben was saying that any action on a super column requires de-re-serialization. But, it would be nice if a write had this sort of efficiency. I have been playing with the 1.1.1 version, in that one there are 'composite' columns, which I think are like super columns, but they don't require serialization and deserialization. However, there seems to be a catch. You can't 'invent' columns on the fly, everything has to be declared when you declare the column family. ---greg On Wed, Jun 13, 2012 at 6:52 PM, Oleg Dulinoleg.du...@gmail.com wrote: Does a write to a sub column involve deserialization of the entire super column ? Thanks, Oleg
Re: Supercolumn behavior on writes
Via thrift, or a high level client on thrift, see as an example http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1 On 06/13/2012 11:08 PM, Greg Fausak wrote: Interesting. How do you do it? I have a version 2 CF, that works fine. A version 3 table won't let me invent columns that don't exist yet. (for composite tables). What's the trick? cqlsh -3 cas1 use onplus; cqlsh:onplus select * from at_event where ac_event_id = 7690254; ac_event_id | ac_creation | ac_event_type | ac_id | ev_sev -+--+---+---+ 7690254 | 2011-07-23 00:11:47+ | SERV.CPE.CONN | \N | 5 cqlsh:onplus update at_event set wingy = 'toto' where ac_event_id = 7690254; Bad Request: Unknown identifier wingy This is what I used to create it: // // create the event column family, this contains the static // part of the definition. many additional columns can be specified // in the port from relational, these would be mainly the at_event table // use onplus; create columnfamily at_event ( ac_event_id int PRIMARY KEY, ac_event_type text, ev_sev int, ac_id text, ac_creation timestamp ) with compression_parameters:sstable_compression = '' ; -g On Wed, Jun 13, 2012 at 9:36 PM, samalsamalgo...@gmail.com wrote: You can't 'invent' columns on the fly, everything has to be declared when you declare the column family. That' s incorrect. You can define name on fly. Validation must be define when declaring CF
Re: Out of memory error
What version of Cassandra? might be related to https://issues.apache.org/jira/browse/CASSANDRA-4098 On 06/11/2012 12:07 AM, Prakrati Agrawal wrote: Sorry I ran list /columnFamilyName/; and it threw this error. Thanks and Regards Prakrati *From:*aaron morton [mailto:aa...@thelastpickle.com] *Sent:* Saturday, June 09, 2012 12:18 AM *To:* user@cassandra.apache.org *Subject:* Re: Out of memory error When you ask a question please include the query or function call you have made. An any other information that would help someone understand what you are trying to do. Also, please list things you have already tried to work around the problem. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/06/2012, at 9:04 PM, Prakrati Agrawal wrote: Dear all, When I try to list the entire data in my column family I get the following error: Using default limit of 100 Exception in thread main java.lang.OutOfMemoryError: Java heap space at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:140) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:683) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:667) at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1373) at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:264) at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:219) at org.apache.cassandra.cli.CliMain.main(CliMain.java:346) Please help me Thanks and Regards Prakrati This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Schema changes not getting picked up from different process
What version are you using? It might be related to https://issues.apache.org/jira/browse/CASSANDRA-4052 On 05/25/2012 07:32 AM, Victor Blaga wrote: Hi all, This is my first message on this posting list so I'm sorry if I am breaking any rules. I just wanted to report some sort of a problem that I'm having with Cassandra. Short version of my problem: if I make changes to the schema from within a process, they do not get picked up by the other processes that are connected to the Cassandra cluster unless I trigger a reconnect. Long version: Process 1: cassandra-cli connected to cluster and keyspace Process 2: cassandra-cli connected to cluster and keyspace From within process 1 - create column family test; From within process 2 - describe test; - fails with an error (other query/insert methods fail as well). I'm not sure if this is indeed a bug or just a misunderstanding from my part. Regards, Victor
Re: unsubscribe
On 05/21/2012 02:44 AM, Qingyan(Evan) Liu wrote: send to user-unsubscr...@cassandra.apache.org
Re: unsubscribe
On 05/17/2012 09:49 PM, casablinca126.com wrote: unsubscribe send that message to user-unsubscr...@cassandra.apache.org
Re: Startup fails after updgrading from 1.0.8 to 1.1.0
Might be related to https://issues.apache.org/jira/browse/CASSANDRA-3794 On 05/16/2012 08:12 AM, Christoph Eberhardt wrote: Hi there, if updgraded cassandra from 1.0.8 to 1.1.0. It seemed to work in the first place, all seemed to work fine. So I started upgrading the rest of the cluster (at the time only one other node, that is a replica). After having a several errors, I restarted the cluster and now cassandra won't even start up. Startup fails with the following error message: INFO 13:59:05,175 Logging initialized INFO 13:59:05,178 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_26 INFO 13:59:05,178 Heap size: 8420720640/8420720640 INFO 13:59:05,178 Classpath: bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.1.0.jar:bin/../lib/apache-cassandra-clientutil-1.1.0.jar:bin/../lib/apache-cassandra-thrift-1.1.0.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/metrics-core-2.0.3.jar:bin/../lib/mx4j-tools.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/snaptree-0.1.jar:bin/../lib/jamm-0.2.5.jar INFO 13:59:07,312 JNA mlockall successful INFO 13:59:07,325 Loading settings from file:/opt/cassandra/apache-cassandra-1.1.0/conf/cassandra.yaml INFO 13:59:07,419 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 13:59:07,567 Global memtable threshold is enabled at 2676MB INFO 13:59:07,654 Initializing key cache with capacity of 100 MBs. INFO 13:59:07,661 Scheduling key cache save to each 14400 seconds (going to save all keys). INFO 13:59:07,662 Initializing row cache with capacity of 0 MBs and provider org.apache.cassandra.cache.SerializingCacheProvider INFO 13:59:07,664 Scheduling row cache save to each 0 seconds (going to save all keys). INFO 13:59:07,717 Opening /opt/cassandra/database/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-9 (28520 bytes) INFO 13:59:07,717 Opening /opt/cassandra/database/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-10 (28520 bytes) INFO 13:59:07,746 Opening /opt/cassandra/database/data/system/NodeIdInfo/system-NodeIdInfo-hc-1 (187 bytes) INFO 13:59:07,777 Opening /opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-3 (1892 bytes) INFO 13:59:07,777 Opening /opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-1 (1892 bytes) INFO 13:59:07,777 Opening /opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-2 (1892 bytes) INFO 13:59:07,790 Opening /opt/cassandra/database/data/system/Versions/system-Versions-hc-34 (247 bytes) INFO 13:59:07,790 Opening /opt/cassandra/database/data/system/Versions/system-Versions-hc-35 (247 bytes) INFO 13:59:07,815 Opening /opt/cassandra/database/data/system/IndexInfo/system-IndexInfo-hc-32 (490 bytes) INFO 13:59:07,816 Opening /opt/cassandra/database/data/system/IndexInfo/system-IndexInfo-hc-33 (115 bytes) INFO 13:59:07,850 Opening /opt/cassandra/database/data/system/schema_keyspaces/system-schema_keyspaces-hc-9 (506 bytes) INFO 13:59:07,850 Opening /opt/cassandra/database/data/system/schema_keyspaces/system-schema_keyspaces-hc-10 (506 bytes) INFO 13:59:07,867 Opening /opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-74 (148 bytes) INFO 13:59:07,867 Opening /opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-72 (406 bytes) INFO 13:59:07,867 Opening /opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-73 (80 bytes)ERROR 13:59:08,244 Exception encountered during startup java.lang.IllegalArgumentException: value already present: 1034 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115) at com.google.common.collect.AbstractBiMap.putInBothMaps(AbstractBiMap.java:111) at com.google.common.collect.AbstractBiMap.put(AbstractBiMap.java:96) at com.google.common.collect.HashBiMap.put(HashBiMap.java:84) at org.apache.cassandra.config.Schema.load(Schema.java:385) at org.apache.cassandra.config.Schema.load(Schema.java:106) at org.apache.cassandra.config.Schema.load(Schema.java:91) at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:533) at
Re: Retrieving old data version for a given row
You're in for a world of hurt going down that rabbit hole. If you truely want version data then you should think about changing your keying to perhaps be a composite key where key is of form NaturalKey/VersionId Or if you want the versioning at the column level, use composite columns with ColumnName/VersionId format On 05/16/2012 10:16 AM, Felipe Schmidt wrote: That was very helpfull, thank you very much! I still have some questions: -it is possible to make Cassandra keep old value data after flushing? The same question for the memTable, before flushing. Seems to me that when I update some tuple, the old data will be overwrited in memTable, even before flushing. -it is possible to scan values from the memtable, maybe using the so-called Thrift API? Using the client-api I can just see the newest data version, I can't see what's really happening with the memTable. I ask that cause what I'll try to do is a Change Data Capture to Cassandra and the answers will define what kind of aproaches I'm able to use. Thanks in advance. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/14 aaron mortonaa...@thelastpickle.com: Cassandra does not provide access to multiple versions of the same column. It is essentially implementation detail. All mutations are written to the commit log in a binary format, see the o.a.c.db.RowMutation.getSerializedBuffer() (If you want to tail it for analysis you may want to change commitlog_sync in cassandra.yaml) Here is post about looking at multiple versions columns in an sstable http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/ Remember that not all versions of a column are written to disk (see http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/). Also compaction will compress multiple versions of the same column from multiple files into a single version in a single file . Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/05/2012, at 9:50 PM, Felipe Schmidt wrote: Yes, I need this information just for academic purposes. So, to read old data values, I tried to open the Commitlog using tail -f and also the log files viewer of Ubuntu, but I can not see many informations inside of the log! Is there any other way to open this log? I didn't find any Cassandra API for this purpose. Thanks averybody in advance. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/14 zhangcheng2zhangche...@software.ict.ac.cn: After compaciton, the old version data will gone! zhangcheng2 From: Felipe Schmidt Date: 2012-05-14 05:33 To: user Subject: Retrieving old data version for a given row I'm trying to retrieve old data version for some row but it seems not be possible. I'm a beginner with Cassandra and the unique aproach I know is looking to the SSTable in the storage folder, but if I insert some column and right after insert another value to the same row, after flushing, I only get the last value. Is there any way to get the old data version? Obviously, before compaction. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil)
Re: Startup fails after updgrading from 1.0.8 to 1.1.0
tracking issue here: https://issues.apache.org/jira/browse/CASSANDRA-4251 might be related to: https://issues.apache.org/jira/browse/CASSANDRA-3794 On 05/16/2012 08:12 AM, Christoph Eberhardt wrote: Hi there, if updgraded cassandra from 1.0.8 to 1.1.0. It seemed to work in the first place, all seemed to work fine. So I started upgrading the rest of the cluster (at the time only one other node, that is a replica). After having a several errors, I restarted the cluster and now cassandra won't even start up. Startup fails with the following error message: INFO 13:59:05,175 Logging initialized INFO 13:59:05,178 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_26 INFO 13:59:05,178 Heap size: 8420720640/8420720640 INFO 13:59:05,178 Classpath: bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.1.0.jar:bin/../lib/apache-cassandra-clientutil-1.1.0.jar:bin/../lib/apache-cassandra-thrift-1.1.0.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/metrics-core-2.0.3.jar:bin/../lib/mx4j-tools.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/snaptree-0.1.jar:bin/../lib/jamm-0.2.5.jar INFO 13:59:07,312 JNA mlockall successful INFO 13:59:07,325 Loading settings from file:/opt/cassandra/apache-cassandra-1.1.0/conf/cassandra.yaml INFO 13:59:07,419 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 13:59:07,567 Global memtable threshold is enabled at 2676MB INFO 13:59:07,654 Initializing key cache with capacity of 100 MBs. INFO 13:59:07,661 Scheduling key cache save to each 14400 seconds (going to save all keys). INFO 13:59:07,662 Initializing row cache with capacity of 0 MBs and provider org.apache.cassandra.cache.SerializingCacheProvider INFO 13:59:07,664 Scheduling row cache save to each 0 seconds (going to save all keys). INFO 13:59:07,717 Opening /opt/cassandra/database/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-9 (28520 bytes) INFO 13:59:07,717 Opening /opt/cassandra/database/data/system/schema_columnfamilies/system-schema_columnfamilies-hc-10 (28520 bytes) INFO 13:59:07,746 Opening /opt/cassandra/database/data/system/NodeIdInfo/system-NodeIdInfo-hc-1 (187 bytes) INFO 13:59:07,777 Opening /opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-3 (1892 bytes) INFO 13:59:07,777 Opening /opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-1 (1892 bytes) INFO 13:59:07,777 Opening /opt/cassandra/database/data/system/schema_columns/system-schema_columns-hc-2 (1892 bytes) INFO 13:59:07,790 Opening /opt/cassandra/database/data/system/Versions/system-Versions-hc-34 (247 bytes) INFO 13:59:07,790 Opening /opt/cassandra/database/data/system/Versions/system-Versions-hc-35 (247 bytes) INFO 13:59:07,815 Opening /opt/cassandra/database/data/system/IndexInfo/system-IndexInfo-hc-32 (490 bytes) INFO 13:59:07,816 Opening /opt/cassandra/database/data/system/IndexInfo/system-IndexInfo-hc-33 (115 bytes) INFO 13:59:07,850 Opening /opt/cassandra/database/data/system/schema_keyspaces/system-schema_keyspaces-hc-9 (506 bytes) INFO 13:59:07,850 Opening /opt/cassandra/database/data/system/schema_keyspaces/system-schema_keyspaces-hc-10 (506 bytes) INFO 13:59:07,867 Opening /opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-74 (148 bytes) INFO 13:59:07,867 Opening /opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-72 (406 bytes) INFO 13:59:07,867 Opening /opt/cassandra/database/data/system/LocationInfo/system-LocationInfo-hc-73 (80 bytes)ERROR 13:59:08,244 Exception encountered during startup java.lang.IllegalArgumentException: value already present: 1034 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115) at com.google.common.collect.AbstractBiMap.putInBothMaps(AbstractBiMap.java:111) at com.google.common.collect.AbstractBiMap.put(AbstractBiMap.java:96) at com.google.common.collect.HashBiMap.put(HashBiMap.java:84) at org.apache.cassandra.config.Schema.load(Schema.java:385) at org.apache.cassandra.config.Schema.load(Schema.java:106) at org.apache.cassandra.config.Schema.load(Schema.java:91) at
Re: understanding of native indexes: limitations, potential side effects,...
Each index you define on the source CF is created using an internal CF that has as its key the value of the column it's indexing, and as its columns, all the keys of all the rows in the source CF that have that value. So if all your rows in your source CF have the same value, then your index cf will have one row with N columns for each N rows in the original CF. On 05/16/2012 02:58 PM, David Vanderfeesten wrote: Txs Jeremiah, But I am not sure I am following number of columns could be equal to number of rows . Is native index implemented as one cf shared over all the indexes (one row in the idx CF corresponding to one index) or is there an internal index cf per index?. My (potential wrong) mindset was the latter. In that case if you would index a column with a very high cardinality like for example serialNbr, this corresponding internal idx cf will just lead to almost the same nbr of rows as the original cf containing the serialnbr. I can''t match that what you are explaining... - David On Wed, May 16, 2012 at 6:23 PM, Jeremiah Jordan jeremiah.jor...@morningstar.com mailto:jeremiah.jor...@morningstar.com wrote: The limitation is because number of columns could be equal to number of rows. If number of rows is large this can become an issue. -Jeremiah *From:* David Vanderfeesten [feest...@gmail.com mailto:feest...@gmail.com] *Sent:* Wednesday, May 16, 2012 6:58 AM *To:* user@cassandra.apache.org mailto:user@cassandra.apache.org *Subject:* understanding of native indexes: limitations, potential side effects,... Hi I like to better understand the limitations of native indexes, potential side effects and scenarios where they are required. My understanding so far : - Is that indexes on each node are storing indexes for data locally on the node itself. - Indexes do not return values in a sorted way (hashes of the indexed row keys are defining the order) - Given by the design referred in the first bullet, a coordinator node receiving a read of a native index, needs to spawn a read to multiple nodes(set of nodes together covering at least the complete key space + potentially more to assure read consistency level). - Each write to an indexed column leads to an additional local read of the index to update the index (kind of obvious but easily forgotten when tuning your system for write-only workload) - When using a where clause in CQL you need at least to specify an equal condition on a native indexed column. Additional conditions in the where clause are filtered out by the coordinator node receiving the CQL query. - native indexes do not support very well columns with high number of discrete values throughout the entire CF. Is upper understanding correct and complete? Some doubts: - about the limitation of indexing columns with high number of discrete values: I assume native indexes are implemented with an internally managed CF per index. With high cardinality values, in worst case, the number of rows in the index are identical to the number of rows of the indexed CF. Or are there other reasons for the limitation, and if that's the case, is there a guideline on the max. nbr of cardinality that is still reasonable? -Are column updates and the update of the indexes (read + write action) atomic and isolated from concurrent updates? Txs! David
Re: cassandra upgrade to 1.1 - migration problem
The replication factor for a keyspace is stored in the system.schema_keyspaces column family. Since you can't view this with cli as the server won't start, the only way to look at it, that i know of is to use the sstable2json tool on the *.db file for that column family... So for instance on my machine i do ./sstable2json /var/lib/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-ia-1-Data.db and get { 7374726573735f6b73: [[durable_writes,true,1968197311980145], [name,stress_ks,1968197311980145], [strategy_class,org.apache.cassandra.locator.SimpleStrategy,1968197311980145], [strategy_options,{\replication_factor\:\3\},1968197311980145]] It's likely you don't have a entry from replication_factor. Theoretically i suppose you could embellish the output, and use json2sstable to fix it, but I have no experience here, and would get the blessings of datastax fellas, before proceeding. On 05/15/2012 07:02 PM, Casey Deccio wrote: Sorry to reply to my own message (again). I took a closer look at the logs and realized that the partitioner errors aren't what kept the daemon to stop; those errors are in the logs even before I upgraded. This one seems to be the culprit. java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160) Caused by: java.lang.RuntimeException: org.apache.cassandra.config.ConfigurationException: SimpleStrategy requires a replication_factor strategy option. at org.apache.cassandra.db.Table.init(Table.java:275) at org.apache.cassandra.db.Table.open(Table.java:114) at org.apache.cassandra.db.Table.open(Table.java:97) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:204) at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:254) ... 5 more Caused by: org.apache.cassandra.config.ConfigurationException: SimpleStrategy requires a replication_factor strategy option. at org.apache.cassandra.locator.SimpleStrategy.validateOptions(SimpleStrategy.java:71) at org.apache.cassandra.locator.AbstractReplicationStrategy.createReplicationStrategy(AbstractReplicationStrategy.java:218) at org.apache.cassandra.db.Table.createReplicationStrategy(Table.java:295) at org.apache.cassandra.db.Table.init(Table.java:271) ... 9 more Cannot load daemon I'm not sure how to check the replication_factor and/or update it without using cassandra-cli, which requires the daemon to be running. Casey
Re: How to make the search by columns in range case insensitive ?
This could be accomplished with a custom 'CaseInsensitiveUTF8Type' comparator to be used as the comparator for that column family. This would require adding a class of your writing to the server. On 05/14/2012 07:26 AM, Ertio Lew wrote: I need to make a search by names index using entity names as column names in a row. This data is split in several rows using the first 3 character of entity name as row key the remaining part as column name col value contains entity id. But there is a problem, I m storing this data in a CF using byte type comparator. I need to make case insensitive queries to retrieve 'n' no of cols column names starting from a point. Any ideas about how should I do that ?
Re: How do I add a custom comparator class to a cassandra cluster ?
it can be in a separate jar with just one class. On 05/15/2012 12:29 AM, Ertio Lew wrote: Can I put this comparator class in a separate new jar(with just this single file) or is it to be appended to the original jar along with the other comparator classes? On Tue, May 15, 2012 at 12:22 AM, Tom Duffield (Mailing Lists) tom.duffield.li...@gmail.com mailto:tom.duffield.li...@gmail.com wrote: Kirk is correct. -- Tom Duffield (Mailing Lists) Sent with Sparrow http://www.sparrowmailapp.com/?sig On Monday, May 14, 2012 at 1:41 PM, Kirk True wrote: Disclaimer: I've never tried, but I'd imagine you can drop a JAR containing the class(es) into the lib directory and perform a rolling restart of the nodes. On 5/14/12 11:11 AM, Ertio Lew wrote: I need to add a custom comparator to a cluster, to sort columns in a certain customized fashion. How do I add the class to the cluster ?
Re: Retrieving old data version for a given row
The only way you could get the old value for a column would be to insert the column value, then flush, then insert the new column, then before compaction look at the old sstable. If you insert the value twice in a row without a flush, the old value is gone, as it only exists in memtables (and in the commit log - of course). Hopefully you want this information for learning purposes only, and aren't actually using this for real purposes. On 05/13/2012 05:33 PM, Felipe Schmidt wrote: I'm trying to retrieve old data version for some row but it seems not be possible. I'm a beginner with Cassandra and the unique aproach I know is looking to the SSTable in the storage folder, but if I insert some column and right after insert another value to the same row, after flushing, I only get the last value. Is there any way to get the old data version? Obviously, before compaction. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil)
Re: primary keys query
Inequalities on secondary indices are always done in memory, so without at least one EQ on another secondary index you will be loading every row in the database, which with a massive database isn't a good idea. So by requiring at least one EQ on an index, you hopefully limit the set of rows that need to be read into memory to a manageable size. Although obviously you can still get into trouble with that as well. On 05/11/2012 09:39 AM, cyril auburtin wrote: Sorry for askign that but Why is it necessary to always have at least one EQ comparison [default@Keyspace1] get test where birth_year1985; No indexed columns present in index clause with operator EQ It oblige to have one dummy indexed column, to do this query [default@Keyspace1] get test where tag=sea and birth_year1985; --- RowKey: sam = (column=birth_year, value=1988, timestamp=1336742346059000)
Re: Behavior on inconsistent reads
If you read at Consistency of at least quorum, you are guaranteed that at least one of the nodes has the latest data, and so you get the right data. If you read with less than quorum it would be possible for all the nodes that respond to have stale data. On 05/10/2012 09:46 PM, Carpenter, Curt wrote: Hi all, newbie here. Be gentle. From http://www.datastax.com/docs/1.0/cluster_architecture/about_client_requests: Thus, the coordinator first contacts the replicas specified by the consistency level. The coordinator will send these requests to the replicas that are currently responding most promptly. The nodes contacted will respond with the requested data; if multiple nodes are contacted, the rows from each replica are compared in memory to see if they are consistent. If they are not, then the replica that has the most recent data (based on the timestamp) is used by the coordinator to forward the result back to the client. To ensure that all replicas have the most recent version of frequently-read data, the coordinator also contacts and compares the data from all the remaining replicas that own the row in the background, and if they are inconsistent, issues writes to the out-of-date replicas to update the row to reflect the most recently written values. This process is known as/read repair/. Read repair can be configured per column family (using/read_repair_chance/ http://www.datastax.com/docs/1.0/configuration/storage_configuration#read-repair-chance), and is enabled by default. For example, in a cluster with a replication factor of 3, and a read consistency level of QUORUM, 2 of the 3 replicas for the given row are contacted to fulfill the read request. Supposing the contacted replicas had different versions of the row, the replica with the most recent version would return the requested data. In the background, the third replica is checked for consistency with the first two, and if needed, the most recent replica issues a write to the out-of-date replicas. Always returns the most recent? What if the most recent write is corrupt? I thought the whole point of a quorum was that consistency is verified /before/ the data is returned to the client. No? Thanks, Curt
Re: EC2 Best Practices
0 is a perfectly valid id.node - 1 is modulo the maximum token value. that token range is 0 - 2**127so node - 1 in this case is 2**127 - Original Message -From: quot;Deno Vichasquot; ;d...@syncopated.net
Re: Bad Request: No indexed columns present in by-columns clause with equals operator
Works for me on trunk... what version are you using? On 04/23/2012 08:39 AM, mdione@orange.com wrote: I understand the error message, but I don't understand why I get it. Here's the CF: cqlsh:avatars describe columnfamily HBX_FILE; CREATE COLUMNFAMILY HBX_FILE ( KEY blob PRIMARY KEY, HBX_FIL_DATE text, HBX_FIL_LARGE ascii, HBX_FIL_MEDIUM ascii, HBX_FIL_SMALL ascii, HBX_FIL_STATUS text, HBX_FIL_TINY ascii ) WITH comment='' AND comparator=text AND read_repair_chance=1.00 AND gc_grace_seconds=864000 AND default_validation=blob AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write=True; CREATE INDEX HBX_FILE_HBX_FIL_STATUS_idx ON HBX_FILE (HBX_FIL_STATUS); The query and the error: cqlsh:avatars SELECT HBX_FIL_SMALL FROM HBX_FILE WHERE KEY=1 AND HBX_FIL_STATUS='actif'; Bad Request: No indexed columns present in by-columns clause with equals operator A query that works: cqlsh:avatars SELECT HBX_FIL_STATUS FROM HBX_FILE WHERE KEY=1; HBX_FIL_STATUS Actif Just in case, here's cli's output for the same CF: [default@avatars] describe HBX_FILE; ColumnFamily: HBX_FILE Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider Key cache size / save period in seconds: 20.0/14400 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Bloom Filter FP chance: default Built indexes: [] Column Metadata: Column Name: HBX_FIL_DATE Validation Class: org.apache.cassandra.db.marshal.UTF8Type Column Name: HBX_FIL_LARGE Validation Class: org.apache.cassandra.db.marshal.AsciiType Column Name: HBX_FIL_MEDIUM Validation Class: org.apache.cassandra.db.marshal.AsciiType Column Name: HBX_FIL_SMALL Validation Class: org.apache.cassandra.db.marshal.AsciiType Column Name: HBX_FIL_STATUS Validation Class: org.apache.cassandra.db.marshal.UTF8Type Index Name: HBX_FILE_HBX_FIL_STATUS_idx Index Type: KEYS Column Name: HBX_FIL_TINY Validation Class: org.apache.cassandra.db.marshal.AsciiType Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy And the same error, with other words, in the CLI: [default@avatars] get HBX_FILE where HBX_FIL_STATUS = 'actif'; No indexed columns present in index clause with operator EQ Am I missing something? Might as well be that I'm too tired... -- Marcos Dione SysAdmin Astek Sud-Est pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo 04 97 12 62 45 - mdione@orange.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Re: 200TB in Cassandra ?
I think your math is 'relatively' correct. It would seem to me you should focus on how you can reduce the amount of storage you are using per item, if at all possible, if that node count is prohibitive. On 04/19/2012 07:12 AM, Franc Carter wrote: Hi, One of the projects I am working on is going to need to store about 200TB of data - generally in manageable binary chunks. However, after doing some rough calculations based on rules of thumb I have seen for how much storage should be on each node I'm worried. 200TB with RF=3 is 600TB = 600,000GB Which is 1000 nodes at 600GB per node I'm hoping I've missed something as 1000 nodes is not viable for us. cheers -- *Franc Carter* |Systems architect|Sirca Ltd mailto:marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au mailto:franc.car...@sirca.org.au| www.sirca.org.au http://www.sirca.org.au/ Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: Column Family per User
Your design should be around how you want to query. If you are only querying by user, then having a user as part of the row key makes sense. To manage row size, you should think of a row as being a bucket of time. Cassandra supports a large (but not without bounds) row size. To manage row size you might say that this row is for user fred for the month of april, or if that's too much perhaps the row is for user fred for the day 4/18/12. To do this you can use composite keys to hold both pieces of information in the key. (user, bucketpos)The nice thing is that once the time period has come and gone, that row is complete, and you can perform background jobs against that row and store summary information for that time period. - Original Message -From: quot;Trevor Francisquot; ;trevor.fran...@tgrahamcapital.com
Re: Column Family per User
Yes in this cassandra model, time wouldn't be a column value, it would be part of the column name. Depending on how you want to access your data (give me all data points for time X) and how many separate datapoints you have for time X, you might consider packing all the data for a time in one column thru composite columnscolumn name: 2012-04-12T12:22:23.293/55/45/10 (where / is a human readable representation of the composite separator) in this case there wouldn't actually be a value, the data is just encoded in the column name.Obviously if you are storing dozens of separate datapoints for a timestamp than this gets out of hand quickly, and perhaps you need to go back to column names with time/fieldname format with a real value.the advantage tho of the composite key is that you eliminate all that constant blather about 'Wind' 'Rain' 'Sunshine' in your data and only hold real data. (granted compression will probably help here, but not having it all is even better).as for row size, obv iously that takes some experimentation on you part. You can bucket a row to be any time frame you want. If you feel that 15 minutes is the correct length of time given the amount of data you will write, then use 15 minutes. It it's 1 hour, use 1 hour. The only thing you have to figure out is a 'bucket time' definition that you understand, likely it's the timestamp of when that time period starts.As for 'rotating the row', perhaps it's just semantics, but there really is no such concept. You are at some point in time, and you want to write some data to the database.The steps are1) get the user2) get the timestamp of the current bucket based on 'now'3) build a composite key4) insert the data with that keyWhether that row existed before or is a new row has no bearing on your client code. - Original Message -From: quot;Trevor Francisquot; ;trevor.fran...@tgrahamcapital.com
Re: Column Family per User
Yes in this cassandra model, time wouldn't be a column value, it would be part of the column name. Depending on how you want to access your data (give me all data points for time X) and how many separate datapoints you have for time X, you might consider packing all the data for a time in one column thru composite columnscolumn name: 2012-04-12T12:22:23.293/55/45/10 (where / is a human readable representation of the composite separator) in this case there wouldn't actually be a value, the data is just encoded in the column name.Obviously if you are storing dozens of separate datapoints for a timestamp than this gets out of hand quickly, and perhaps you need to go back to column names with time/fieldname format with a real value.the advantage tho of the composite key is that you eliminate all that constant blather about 'Wind' 'Rain' 'Sunshine' in your data and only hold real data. (granted compression will probably help here, but not having it all is even better).as for row size, obviously that takes some experimentation on you part. You can bucket a row to be any time frame you want. If you feel that 15 minutes is the correct length of time given the amount of data you will write, then use 15 minutes. It it's 1 hour, use 1 hour. The only thing you have to figure out is a 'bucket time' definition that you understand, likely it's the timestamp of when that time period starts.As for 'rotating the row', perhaps it's just semantics, but there really is no such concept. You are at some point in time, and you want to write some data to the database.The steps are1) get the user2) get the timestamp of the current bucket based on 'now'3) build a composite key4) insert the data with that keyWhether that row existed before or is a new row has no bearing on your client code. - Original Message -From: quot;Trevor Francisquot; ;trevor.fran...@tgrahamcapital.com;trevor.fran...@tgrahamcapital.com
Re: Column Family per User
It seems to me you are on the right track. Finding the right balance of # rows vs row width is the part that will take the most experimentation. - Original Message -From: quot;Trevor Francisquot; ;trevor.fran...@tgrahamcapital.com
Re: Trying to avoid super columns
If you want to reduce the number of columns, you could pack all the data for a product into one column, as in composite column name- product_id_1:12.44:1.00:3.00 On 04/12/2012 03:03 PM, Philip Shon wrote: I am currently working on a data model where the purpose is to look up multiple products for given days of the year. Right now, that model involves the usage of a super column family. e.g. 2012-04-12: { product_id_1: { price: 12.44, tax: 1.00, fees: 3.00, }, product_id_2: { price: 50.00, tax: 4.00, fees: 10.00 } } I should note that for a given day/key, we are expecting in the range of 2 million to 4 million products (subcolumns). With this model, I am able to retrieve any of the products for a given day using hector's MultigetSuperSliceQuery. I am looking into changing this model to use Composite column names. How would I go about modeling this? My initial thought is to migrate the above model into something more like the following. 2012-04-12: { product_id_1:price: 12.44, product_id_1:tax: 1.00, product_id_1:fees: 3.00, product_id_2:price: 50.00, product_id_2:tax: 4.00, product_id_2:fees: 10.00, } The one thing that stands out to me with this approach is the number of additonal columns that will be created for a single key. Will the increase in columns, create new issues I will need to deal with? Are there any other thoughts about if I should actually move forward (or not) with migration this super column family to the model with the component column names? Thanks, Phil
Re: Why so many SSTables?
It's easy to spend other people's money, but handling 1TB of data with 1.5 g heap? Memory is cheap, and just a little more will solve many problems. On 04/11/2012 08:43 AM, Romain HARDOUIN wrote: Thank you for your answers. I originally post this question because we encoutered an OOM Exception on 2 nodes during repair session. Memory analyzing shows an hotspot: an ArrayList of SSTableBoundedScanner which contains as many objects there are SSTables on disk (7747 objects at the time). This ArrayList consumes 47% of the heap space (786 MB). We want each node to handle 1 TB, so we must dramatically reduce the number of SSTables. Thus, is there any drawback if we set sstable_size_in_mb to 200MB? Otherwise shoudl we go back to Tiered Compaction? Regards, Romain Maki Watanabe watanabe.m...@gmail.com a écrit sur 11/04/2012 04:21:47 : You can configure sstable size by sstable_size_in_mb parameter for LCS. The default value is 5MB. You should better to check you don't have many pending compaction tasks with nodetool tpstats and compactionstats also. If you have enough IO throughput, you can increase compaction_throughput_mb_per_sec in cassandra.yaml to reduce pending compactions. maki 2012/4/10 Romain HARDOUIN romain.hardo...@urssaf.fr: Hi, We are surprised by the number of files generated by Cassandra. Our cluster consists of 9 nodes and each node handles about 35 GB. We're using Cassandra 1.0.6 with LeveledCompactionStrategy. We have 30 CF. We've got roughly 45,000 files under the keyspace directory on each node: ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l 44372 The biggest CF is spread over 38,000 files: ls -l Documents* | wc -l 37870 ls -l Documents*-Data.db | wc -l 7586 Many SSTable are about 4 MB: 19 MB - 1 SSTable 12 MB - 2 SSTables 11 MB - 2 SSTables 9.2 MB - 1 SSTable 7.0 MB to 7.9 MB - 6 SSTables 6.0 MB to 6.4 MB - 6 SSTables 5.0 MB to 5.4 MB - 4 SSTables 4.0 MB to 4.7 MB - 7139 SSTables 3.0 MB to 3.9 MB - 258 SSTables 2.0 MB to 2.9 MB - 35 SSTables 1.0 MB to 1.9 MB - 13 SSTables 87 KB to 994 KB - 87 SSTables 0 KB - 32 SSTables FYI here is CF information: ColumnFamily: Documents Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 20.0/14400 GC grace seconds: 1728000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Column Metadata: Column Name: refUUID (7265664944) Validation Class: org.apache.cassandra.db.marshal.BytesType Index Name: refUUID_idx Index Type: KEYS Compaction Strategy: org.apache.cassandra.db.compaction.LeveledCompactionStrategy Compression Options: sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor Is it a bug? If not, how can we tune Cassandra to avoid this? Regards, Romain
Re: Using Thrift
For a thrift client, you need the following jars at a minimum apache-cassandra-clientutil-*.jar apache-cassandra-thrift-*.jar libthrift-*.jar slf4j-api-*.jar slf4j-log4j12-*.jar all of these jars can be found in the cassandra distribution. On 04/02/2012 07:40 AM, Rishabh Agrawal wrote: Any suggestions *From:*Rishabh Agrawal *Sent:* Monday, April 02, 2012 4:42 PM *To:* user@cassandra.apache.org *Subject:* Using Thrift Hello, I have just started exploring Cassandra from java side and using wish to use thrift as my api. The problem is whenever is I try to compile my java code I get following error : package org.slf4j does not exist Can anyone help me with this. Thanks and Regards Rishabh Agrawal Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast 'Cloud-enabled Performance Testing vis-à-vis On-premise' available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast 'Cloud-enabled Performance Testing vis-à-vis On-premise' available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Using Thrift
slf4j is just a logging facade, if you actually want log files, you need a logger, say log4j-*.jar in your classpath. Then just configure that with a log4j.properties file. That properties file also needs to be on the classpath. On 04/02/2012 09:05 AM, Rishabh Agrawal wrote: I didn't fine slf4j files in distribution. So I downloaded them can you help me how to configure it. *From:*Dave Brosius [mailto:dbros...@mebigfatguy.com] *Sent:* Monday, April 02, 2012 6:28 PM *To:* user@cassandra.apache.org *Subject:* Re: Using Thrift For a thrift client, you need the following jars at a minimum apache-cassandra-clientutil-*.jar apache-cassandra-thrift-*.jar libthrift-*.jar slf4j-api-*.jar slf4j-log4j12-*.jar all of these jars can be found in the cassandra distribution. On 04/02/2012 07:40 AM, Rishabh Agrawal wrote: Any suggestions *From:*Rishabh Agrawal *Sent:* Monday, April 02, 2012 4:42 PM *To:* user@cassandra.apache.org mailto:user@cassandra.apache.org *Subject:* Using Thrift Hello, I have just started exploring Cassandra from java side and using wish to use thrift as my api. The problem is whenever is I try to compile my java code I get following error : package org.slf4j does not exist Can anyone help me with this. Thanks and Regards Rishabh Agrawal Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast 'Cloud-enabled Performance Testing vis-à-vis On-premise' available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast 'Cloud-enabled Performance Testing vis-à-vis On-premise' available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast 'Cloud-enabled Performance Testing vis-à-vis On-premise' available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: counter column family
Counter columns are special, they must be in a column family to themselves. On 03/27/2012 09:32 AM, puneet loya wrote: wen i m using a counter column.. i m nt able to add columns of other type to the column family.. is it so or it is just synactical error?? [default@CMDCv99] create column family status ... with comparator = AsciiType ... and column_metadata = ... [{ ... column_name : Test, ... validation_class : IntegerType, ... index_type : 0, ... index_name : IdxName}, ... { ... column_name : 'other name', ... validation_class : CounterColumnType ... }]; Cannot add a counter column (other name) in a non counter column family On Tue, Mar 27, 2012 at 6:55 PM, R. Verlangen ro...@us2.nl mailto:ro...@us2.nl wrote: You should use a connection pool without retries to prevent a single increment of +1 have a result of e.g. +3. 2012/3/27 Rishabh Agrawal rishabh.agra...@impetus.co.in mailto:rishabh.agra...@impetus.co.in You can even define how much increment you want. But let me just warn you, as far my knowledge, it has consistency issues. *From:*puneet loya [mailto:puneetl...@gmail.com mailto:puneetl...@gmail.com] *Sent:* Tuesday, March 27, 2012 5:59 PM *To:* user@cassandra.apache.org mailto:user@cassandra.apache.org *Subject:* Re: counter column family thanxx a ton :) :) the counter column family works synonymous as 'auto increment' in other databases rite? I mean we have a column of type integer which increments with every insert. Am i goin the rite way?? please reply :) On Tue, Mar 27, 2012 at 5:50 PM, R. Verlangen ro...@us2.nl mailto:ro...@us2.nl wrote: *create column family MyCounterColumnFamily with default_validation_class=CounterColumnType and key_validation_class=UTF8Type and comparator=UTF8Type;* There you go! Keys must be utf8, as well as the column names. Of course you can change those validators. Cheers! 2012/3/27 puneet loya puneetl...@gmail.com mailto:puneetl...@gmail.com Can u give an example of create column family with counter column in it. Please reply Regards, Puneet Loya -- With kind regards, Robin Verlangen www.robinverlangen.nl http://www.robinverlangen.nl Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. -- With kind regards, Robin Verlangen www.robinverlangen.nl http://www.robinverlangen.nl
Re: Issue with cassandra-cli assume
I think you want assume UserDetails validator as bytes; On 03/23/2012 08:09 PM, Drew Kutcharian wrote: Hi Everyone, I'm having an issue with cassandra-cli's assume command with a custom type. I tried it with the built-in BytesType and got the same error: [default@test] assume UserDetails validator as org.apache.cassandra.db.marshal.BytesType; Syntax error at position 35: missing EOF at '.' I also tried it with single and double quotes with no success: [default@test] assume UserDetails validator as 'org.apache.cassandra.db.marshal.BytesType'; Syntax error at position 32: mismatched input ''org.apache.cassandra.db.marshal.BytesType'' expecting Identifier Is this a bug? I'm using Cassanda 1.0.7 on Mac OSX Lion. Thanks, Drew
Re: Order rows numerically
if your keys are 1-n and you are using BOP, then almost certainly your ring will be massively unbalanced with the first node getting clobbered. You'll have bigger issues than getting lexical ordering. I'd try to rethink your design so that you don't need BOP. On 03/16/2012 06:49 PM, Watanabe Maki wrote: How about to fill zeros before smaller digits? Ex. 0001, 0002, etc maki On 2012/03/17, at 6:29, A Js5a...@gmail.com wrote: If I define my rowkeys to be Integer (key_validation_class=IntegerType) , how can I order the rows numerically ? ByteOrderedPartitioner orders lexically and retrieval using get_range does not seem to make sense in order. If I were to change rowkey to be UTF8 (key_validation_class=UTF8Type), BOP still does not give numerical enough. For range of rowkey from 1 to 2, I get 1, 10,11.,2 (lexical ordering). Any workaround for this ? Thanks.
Re: Why is row lookup much faster than column lookup
Given the hashtable nature of cassandra, finding a row is probably 'relatively' constant no matter how many columns you have.The smaller the number of columns, i suppose the more likely that all the columns will be in one sstable. If you've got a ton of columns per row, it is much more likely that these columns will be spread out in multple ss tables. Plus, columns are read in chunks, depending on yaml settings. - Original Message -From: quot;A Jquot; ;s5a...@gmail.com
Re: Why is row lookup much faster than column lookup
sorry, should have been: Given the hashtable nature of cassandra, finding a row is probably 'relatively' constant no matter how many *rows* you have. - Original Message -From: quot;Dave Brosiusquot; ;dbros...@mebigfatguy.com
Re: key sorting question
With random partitioner, the rows are sorted by the hashes of the keys, so for all intents and purposes, not sorted. This comment below really is talking about how columns are sorted, and yes when time uuids are used, they are sorted by the time component, as a time uuids start with the time component and then adds various randomness bits. On 03/07/2012 01:51 AM, Tamar Fraenkel wrote: Hi! I am currently experimenting withCassandra1.0.7, but while readinghttp://www.datastax.com/dev/blog/schema-in-cassandra-1-1 somethingcaughtmy eye: "Cassandra ordersversion 1 UUIDsby their time component" Is this true? If I have for example USER_CF where key is randomly generated java.util.UUID (UUID.randomUUID()), will the rows be sorted by the generation time? I use random partitioner if that makes any difference. Thanks, Tamar Fraenkel Senior Software Engineer, TOK Media ta...@tok-media.com Tel:+972 2 6409736 Mob:+972 54 8356490 Fax:+972 2 5612956
Re: TimeUUID
Given that these rows are wanted to be time buckets, you would want collisions, in fact that would be the standard way of working, so IMO, the uuid just removes the ability to bucket data and would not be wanted. On 02/28/2012 10:30 AM, Paul Loy wrote: In a multi server env, to avoid key collisions timeuuid may be the better choice. On Monday, February 27, 2012, Tamar Fraenkel wrote: Hi! I have a column family where I use rows as "time buckets". What I do is take epoc time in seconds, and round it to 1 hour (taking the result of time_since_epoc_second divided by 3600). My key validation type is LongType. I wonder whether it is better to use TimeUUID or even readable string representation for time? Thanks, -- Tamar Fraenkel Senior Software Engineer, TOK Media ta...@tok-media.com Tel:+972 2 6409736 Mob:+972 54 8356490 Fax:+972 2 5612956 -- Sent from my iPhone, sorry for my brevity.
Re: Using cassandra at minimal expenditures
I guess the issue with 2 machines and RF=2 is that Consistency level of QUORUM is the same as ALL, so you've pretty much have little flexibility with this setup, of course this might be fine depending on what you want to do. In addition, RF=2 also means that you get no data-storage improvements from being distributed. Having said that, i know there are folks who run 2 machine clusters.dave - Original Message -From: quot;Ertio Lewquot; ;ertio...@gmail.com
Re: Issue regarding 'describe' keyword in 1.0.7 version.
What it's saying is if you define a KeySpace Foo and under it a ColumnFamily called Foo, you won't be able to use describe to describe the ColumnFamily named Foo. On 02/21/2012 07:26 AM, Rishabh Agrawal wrote: Hello, I am newbie to Cassandra. Please bear with my lame doubts. I running Cassandra version on 1.0.7 on Ubuntu. I found following case with /describe/: If there is Keyspace with name 'x' then /describe x /command will give desired results. But if there is also a Column Family named 'x' then describe will not be able to catch it. But if there is only column family 'x' and no keyspace with the same name then /describe x/ command will give desired results i.e. it will be able to capture and display info regarding 'x' column family. Kindly help me with that. Thanks and Regards Rishabh Agrawal Impetus' Head of Innovation labs, Vineet Tyagi will be presenting on 'Big Data Big Costs?' at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7. Listen to our webcast 'Hybrid Approach to Extend Web Apps to Tablets Smartphones' available at http://bit.ly/yQC1oD. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: problem with sliceQuery with composite column
if the composite column was rearranged as ticks:111wouldn't the result be as desired? - Original Message -From: quot;aaron mortonquot; ;aa...@thelastpickle.com
Re: How to find a commit for specific release on git?
Based on the tags listed here: http://git-wip-us.apache.org/repos/asf?p=cassandra.git I would look here http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=9d4c0d9a37c7d77a05607b85611c3abdaf75be94 On 02/12/2012 10:39 PM, Maki Watanabe wrote: Hello, How to find the right commit SHA for specific cassandra release? For example, how to checkout 0.8.9 release on git repository? With git log --grep=0.8.9, I found the latest commit mentioned about 0.8.9 was --- commit 1f92277c4bf9f5f71303ecc5592e27603bc9dec1 Author: Sylvain Lebresneslebre...@apache.org Date: Sun Dec 11 00:02:14 2011 + prepare for release 0.8.9 git-svn-id: https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.8@1212938 13f79535-47bb-0310-9956-ffa450edef68 --- However I don't think it's a reliable way. I've also checked CHANGES.txt and NEW.txt but thoese say nothing on commit SHA. regards,
Re: Unsubscribe
On 02/12/2012 10:53 PM, Shubham Srivastava wrote: -- Sent using BlackBerry send an email to user-unsubscr...@cassandra.apache.org
Re: Unsubscribe
On 02/04/2012 12:05 PM, Andrea Loggia wrote: Unsubscribe If you wish to unsubscribe from the cassandra user list send a blank email here user-unsubscr...@cassandra.apache.org mailto://user-subscr...@cassandra.apache.org
Re: unsubscribe
Folks who wish to unsubscribe should sent a blank email to the following address user-unsubscr...@cassandra.apache.org mailto:user-unsubscr...@cassandra.apache.org
Re: Problems Starting Cassandra Server -
Change your yaml entry for data_file_directories from data_file_directories: F:\cassandra\data to data_file_directories: - F:\cassandra\data On 01/17/2012 11:54 PM, Asha Subramanian wrote: Here is the yaml file.. Thanks *From:*Dave Brosius [mailto:dbros...@mebigfatguy.com] *Sent:* Wednesday, January 18, 2012 9:07 AM *To:* user@cassandra.apache.org; Asha Subramanian *Subject:* Re: Problems Starting Cassandra Server - It probably would be useful to know what your yaml file looks like. On 01/17/2012 08:58 PM, Asha Subramanian wrote: I am a new user of Cassandra and want to understand the basics of Cassandra before moving to cluster installations etc.. I picked up the latest version of Cassandra from the home page 1.0.7 released on 2012/01/16. I am installing on Windows 7 I have followed all the instx for changing the Cassandra.yaml and also the environment variables.. However when I start the server, I get the following error -- What could be the problem ??? F:\cassandra\bincassandra.bat Starting Cassandra Server INFO 07:22:37,766 Logging initialized INFO 07:22:37,828 JVM vendor/version: Java HotSpot(TM) Client VM/1.6.0_30 INFO 07:22:37,844 Heap size: 1070399488/1070399488 INFO 07:22:37,844 Classpath: F:\cassandra\conf;F:\cassandra\lib\antlr-3.2.jar;F :\cassandra\lib\apache-cassandra-1.0.6.jar;F:\cassandra\lib\apache-cassandra-cli entutil-1.0.6.jar;F:\cassandra\lib\apache-cassandra-thrift-1.0.6.jar;F:\cassandr a\lib\avro-1.4.0-fixes.jar;F:\cassandra\lib\avro-1.4.0-sources-fixes.jar;F:\cass andra\lib\commons-cli-1.1.jar;F:\cassandra\lib\commons-codec-1.2.jar;F:\cassandr a\lib\commons-lang-2.4.jar;F:\cassandra\lib\compress-lzf-0.8.4.jar;F:\cassandra\ lib\concurrentlinkedhashmap-lru-1.2.jar;F:\cassandra\lib\guava-r08.jar;F:\cassan dra\lib\high-scale-lib-1.1.2.jar;F:\cassandra\lib\jackson-core-asl-1.4.0.jar;F:\ cassandra\lib\jackson-mapper-asl-1.4.0.jar;F:\cassandra\lib\jamm-0.2.5.jar;F:\ca ssandra\lib\jline-0.9.94.jar;F:\cassandra\lib\json-simple-1.1.jar;F:\cassandra\l ib\libthrift-0.6.jar;F:\cassandra\lib\log4j-1.2.16.jar;F:\cassandra\lib\servlet- api-2.5-20081211.jar;F:\cassandra\lib\slf4j-api-1.6.1.jar;F:\cassandra\lib\slf4j -log4j12-1.6.1.jar;F:\cassandra\lib\snakeyaml-1.6.jar;F:\cassandra\lib\snappy-ja va-1.0.4.1.jar;F:\cassandra\build\classes\main;F:\cassandra\build\classes\thrift ;F:\cassandra\lib\jamm-0.2.5.jar INFO 07:22:37,859 JNA not found. Native methods will be disabled. INFO 07:22:37,891 Loading settings from file:/F:/cassandra/conf/cassandra.yaml file:///F:%5Ccassandra%5Cconf%5Ccassandra.yaml ERROR 07:22:38,234 Fatal configuration error error Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config. Config; exception=Cannot create property=data_file_directories for JavaBean=org. apache.cassandra.config.Config@1329642; No single argument constructor found for class [Ljava.lang.String; in reader, line 10, column 1: cluster_name: 'Test Cluster' ^ at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.constr uct(Constructor.java:372) at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseCo nstructor.java:177) at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(Base Constructor.java:136) at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseCons tructor.java:122) at org.yaml.snakeyaml.Loader.load(Loader.java:52) at org.yaml.snakeyaml.Yaml.load(Yaml.java:166) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescr iptor.java:133) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCa ssandraDaemon.java:125) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(Abstrac tCassandraDaemon.java:337) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java :107) Caused by: org.yaml.snakeyaml.error.YAMLException: Cannot create property=data_f ile_directories for JavaBean=org.apache.cassandra.config.Config@1329642; No sing le argument constructor found for class [Ljava.lang.String; at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct JavaBean2ndStep(Constructor.java:305) at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct (Constructor.java:184) at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.constr uct(Constructor.java:370) ... 9 more Caused by: org.yaml.snakeyaml.error.YAMLException: No single argument constructo r found for class [Ljava.lang.String; at org.yaml.snakeyaml.constructor.Constructor$ConstructScalar.construct( Constructor.java:419) at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseCo nstructor.java:177) at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct JavaBean2ndStep
Re: Integration Error between Cassandra and Eclipse
This works for me http://wiki.apache.org/cassandra/HowToDebug On 01/06/2012 01:18 AM, Kuldeep Sengar wrote: Hi, Can you post the error(saying that only 1 error is there), that'll make things more clear. Thanks Kuldeep Singh Sengar Opera Solutions Tech Boulevard,8th floor, Tower C, Sector 127, Plot No 6,Noida 201 301 +91 (120) 4642424 facsimile, Ext : 2418 +91 8800595878 (M) -Original Message- From: Maki Watanabe [mailto:watanabe.m...@gmail.com] Sent: Friday, January 06, 2012 7:30 AM To: user@cassandra.apache.org Subject: Re: Integration Error between Cassandra and Eclipse Sorry, ignore my reply. I had same result with import. ( 1 error in unit test code many warnings ) 2012/1/6 Maki Watanabewatanabe.m...@gmail.com: How about to use File-Import... rather than File-New Java Project? After extracting the source, ant build, and ant generate-eclipse-files: 1. File-Import... 2. Choose Existing Project into workspace... 3. Choose your source directory as root directory and then push Finish 2012/1/6 bobby saputrazaibat...@gmail.com: Hi There, I am a beginner user in Cassandra. I hear from many people said Cassandra is a powerful database software which is used by Facebook, Twitter, Digg, etc. So I feel interesting to study more about Cassandra. When I performed integration process between Cassandra with Eclipse IDE (in this case I use Java as computer language), I get trouble and have many problem. I have already followed all instruction from http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this tutorial was not working properly. I got a lot of errors and warnings while creating Java project in eclipse. These are the errors and warnings: Error(X) (1 item): Description Resource Location The method rangeSet(RangeT...) in the type Range is not applicable for the arguments (Range[]) RangeTest.java line 178 Warnings(!) (100 of 2916 items): Description Resource Location AbstractType is a raw type. References to generic type AbstractTypeT should be parameterized AbstractColumnContainer.java line 72 (and many same warnings) These are what i've done: 1. I checked out cassandra-trunk from given link using SlikSvn as svn client. 2. I moved to cassandra-trunk folder, and build with ant using ant build command. 3. I generate eclipse files with ant using ant generate-eclipse-files command. 4. I create new java project on eclipse, insert project name with cassandra-trunk, browse the location into cassandra-trunk folder. Do I perform any mistakes? Or there are something wrong with the tutorial in http://wiki.apache.org/cassandra/RunningCassandraInEclipse ?? I have already googling to find the solution to solve this problem, but unfortunately I found no results. Would you want to help me by giving me a guide how to solve this problem? Please Thank you very much for your help. Best Regards, Wira Saputra -- w3m
Re: java thrift error
A ByteBuffer is not a byte[] to convert a String to a ByteBuffer do something likepublic static ByteBuffer toByteBuffer(String value) throws UnsupportedEncodingException { return ByteBuffer.wrap(value.getBytes(quot;UTF-8quot;)); } see http://wiki.apache.org/cassandra/ThriftExamples - Original Message -From: quot;A Jquot; ;s5a...@gmail.com
Re: setStrategy_options syntax in thrift
KsDef ksDef = new KsDef();Map;String, String;String, String
Re: memory estimate for each key in the key cache
On 12/16/2011 10:13 PM, Brandon Williams wrote: On Fri, Dec 16, 2011 at 8:52 PM, Kent Tongfreemant2...@yahoo.com wrote: Hi, From the source code I can see that for each key, the hash (token), the key itself (ByteBuffer) and the position (long. offset in the sstable) are stored into the key cache. The hash is an MD5 hash, so it is 16 bytes. So, the total size required is at least 16+size-of(key)+4 which is 20 bytes. If we consider the overhead of the object references, then it will be even larger. Then, why the wiki recommends multiplying the number of keys cached with 10-12 to get the memory requirement? In a word: java. -Brandon Wow, Java is a lot better than I thought if it can perform that kind of magic. I'm guessing the wiki information is just old and out of date. It's probably more like 60 + sizeof(key)