Re: Effect of rangequeries with RandomPartitioner
inline resp. On Mon, Jul 9, 2012 at 10:18 AM, prasenjit mukherjee prasen@gmail.comwrote: Thanks Aaron for your response. Some follow up questions/assumptions/clarifications : 1. With RandomPartitioner, on a given node, are the keys sorted by their hash_values or original/unhashed keys ? hash value, 2. With RandomPartitioner, on a given node, are the columns (for a given key) always sorted by their column_names ? yes, depends on comparator. 3. From what I understand, token = hash(key) for a RandomPartitioner, and hence any key-range queries will return bogus results. correct. Although I believe column-range-queries should succeed even in RP if they are always sorted by column_names. correct, depends on comparator. -Thanks, Prasenjit On Mon, Jul 9, 2012 at 12:17 AM, aaron morton aa...@thelastpickle.com wrote: for background http://wiki.apache.org/cassandra/FAQ#range_rp It maps the start key to a token, and then scans X rows from their on CL number of nodes. Rows are stored in token order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/07/2012, at 11:52 PM, prasenjit mukherjee wrote: Wondering how a rangequery request is handled if RP is used. Will the receiving node do a fan-out to all the nodes in the ring or it will just execute the rangequery on its own local partition ? -- Sent from my mobile device
Re: Effect of rangequeries with RandomPartitioner
Thanks for the response. Further questions inline.. On Mon, Jul 9, 2012 at 11:50 AM, samal samalgo...@gmail.com wrote: 1. With RandomPartitioner, on a given node, are the keys sorted by their hash_values or original/unhashed keys ? hash value, 1. Based on the second answer in http://stackoverflow.com/questions/2359175/cassandra-file-structure-how-are-the-files-used it seems that the index-file ( for a given ssTable ) contains the row-key ( and not the hash_keys ). Or may be I am missing something. 2. Do the keys in Index-file ( ref http://hi.csdn.net/attachment/20/28/0_1322461982l3D8.gif ) actually contain : hash(row_key)+row_key or something like that ? Otherwise you need a separate mapping info from hash_bucket - rows for reading. -Thanks, Prasenjit
Re: bulk load problem
Hi all, I am facing the same problem when trying to load Cassandra using sstableloader. I am running a Cassandra instance in my own machine and sstableloader is also called from the same machine. Following are the steps I followed. - get a copy of the running Cassandra instance - set another loopback address with sudo ifconfig lo:0 127.0.0.2 netmask 255.0.0.0 up - set listen address and rpc address of the copied Cassandra's cassandra.yaml to 127.0.0.2 - ran ./sstableloader -d 127.0.0.2 directory of created sstables But this give me an error 'Could not retrieve endpoint ranges: ' and just that. I am so grateful for any hints to get over this. What I want to get done is actually running the sstableloader via a java code. But I couldn't get over it either and trying to understand the required args with this. It is great if someone can help me in either cases. Thanks in advance! On Tue, Jul 3, 2012 at 5:16 AM, aaron morton aa...@thelastpickle.comwrote: Do you have the full stack ? It will include a cause. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/06/2012, at 12:07 PM, James Pirz wrote: Dear all, I am trying to use sstableloader in cassandra 1.1.1, to bulk load some data into a single node cluster. I am running the following command: bin/sstableloader -d 192.168.100.1 /data/ssTable/tpch/tpch/ from another node (other than the node on which cassandra is running), while the data should be loaded into a keyspace named tpch. I made sure that the 2nd node, from which I run sstableloader, have the same copy of cassandra.yaml as the destination node. I have put tpch-cf0-hd-1-Data.db tpch-cf0-hd-1-Index.db under the path, I have passed to sstableloader. But I am getting the following error: Could not retrieve endpoint ranges: Any hint ? Thanks in advance, James -- Pushpalanka Jayawardhana | Undergraduate | Computer Science and Engineering University of Moratuwa +94779716248 | http://pushpalankajaya.blogspot.com Twitter: http://twitter.com/Pushpalanka | Slideshare: http://www.slideshare.net/Pushpalanka
Auto backup script
Hi, I have planned to backup of my cassandra 3 node ring database for security. Please let me know that what method i used for backup and if anyone have script, please share with me. -- Thanks Regards *Adeel**Akbar*
Re: Thrift version and OOM errors
Hello, Thanks for the help. There was a problem in the code actually... The connection object was not thread safe. That is why the messages were so big. After fixing that we do not get any errors. The cluster seems stable. Thanks again for all the help. Regards, Vasilis On Thu, Jul 5, 2012 at 11:32 PM, aaron morton aa...@thelastpickle.comwrote: agree. It's a good idea to remove as many variables and possible and get to a stable/known state. Use a clean install and a well known client and see if the problems persist. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/07/2012, at 4:58 PM, Tristan Seligmann wrote: On Jul 4, 2012 2:02 PM, Vasileios Vlachos vasileiosvlac...@gmail.com wrote: Any ideas what could be causing strange message lengths? One cause of this that I've seen is a client using unframed Thrift transport while the server expects framed, or vice versa. I suppose a similar cause could be something that is not a Thrift client at all mistakenly connecting to Cassandra's Thrift port.
Re: bulk load problem
I couldn't get the same-host sstableloader to work either. But it's easier to use the JMX bulk-load hook that's built into Cassandra anyway. The following is what I implemented to do this: import java.io.IOException; import java.util.HashMap; import java.util.Map; import javax.management.JMX; import javax.management.MBeanServerConnection; import javax.management.MalformedObjectNameException; import javax.management.ObjectName; import javax.management.remote.JMXConnector; import javax.management.remote.JMXConnectorFactory; import javax.management.remote.JMXServiceURL; import org.apache.cassandra.service.StorageServiceMBean; public class JmxBulkLoader { private JMXConnector connector; private StorageServiceMBean storageBean; public JmxBulkLoader(String host, int port) throws Exception { connect(host, port); } private void connect(String host, int port) throws IOException, MalformedObjectNameException { JMXServiceURL jmxUrl = new JMXServiceURL(String.format(service:jmx:rmi:///jndi/rmi://%s:%d/jmxrmi, host, port)); MapString,Object env = new HashMapString,Object(); connector = JMXConnectorFactory.connect(jmxUrl, env); MBeanServerConnection mbeanServerConn = connector.getMBeanServerConnection(); ObjectName name = new ObjectName(org.apache.cassandra.db:type=StorageService); storageBean = JMX.newMBeanProxy(mbeanServerConn, name, StorageServiceMBean.class); } public void close() throws IOException { connector.close(); } public void bulkLoad(String path) { storageBean.bulkLoad(path); } public static void main(String[] args) throws Exception { if (args.length == 0) { throw new IllegalArgumentException(usage: paths to bulk files); } JmxBulkLoader np = new JmxBulkLoader(localhost, 7199); for (String arg : args) { np.bulkLoad(arg); } np.close(); } } On Jul 9, 2012, at 5:16 AM, Pushpalanka Jayawardhana wrote: Hi all, I am facing the same problem when trying to load Cassandra using sstableloader. I am running a Cassandra instance in my own machine and sstableloader is also called from the same machine. Following are the steps I followed. get a copy of the running Cassandra instance set another loopback address with sudo ifconfig lo:0 127.0.0.2 netmask 255.0.0.0 up set listen address and rpc address of the copied Cassandra's cassandra.yaml to 127.0.0.2 ran ./sstableloader -d 127.0.0.2 directory of created sstables But this give me an error 'Could not retrieve endpoint ranges: ' and just that. I am so grateful for any hints to get over this. What I want to get done is actually running the sstableloader via a java code. But I couldn't get over it either and trying to understand the required args with this. It is great if someone can help me in either cases. Thanks in advance! On Tue, Jul 3, 2012 at 5:16 AM, aaron morton aa...@thelastpickle.com wrote: Do you have the full stack ? It will include a cause. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/06/2012, at 12:07 PM, James Pirz wrote: Dear all, I am trying to use sstableloader in cassandra 1.1.1, to bulk load some data into a single node cluster. I am running the following command: bin/sstableloader -d 192.168.100.1 /data/ssTable/tpch/tpch/ from another node (other than the node on which cassandra is running), while the data should be loaded into a keyspace named tpch. I made sure that the 2nd node, from which I run sstableloader, have the same copy of cassandra.yaml as the destination node. I have put tpch-cf0-hd-1-Data.db tpch-cf0-hd-1-Index.db under the path, I have passed to sstableloader. But I am getting the following error: Could not retrieve endpoint ranges: Any hint ? Thanks in advance, James -- Pushpalanka Jayawardhana | Undergraduate | Computer Science and Engineering University of Moratuwa +94779716248 | http://pushpalankajaya.blogspot.com Twitter: http://twitter.com/Pushpalanka | Slideshare: http://www.slideshare.net/Pushpalanka
CompositeType support for keynames
Any reason we dont have CompositeType data structure for key_validation_class ( ref: http://www.datastax.com/docs/0.8/configuration/storage_configuration#key-validation-class) ? I would like to create row_names in the form username:mmddhhmm ( e.g. joe:201206092312 ). I can still do that key_validation_class=UTF8Type but was wonderign if there is any in-built validation for these kind of row_names. Thanks, -Prasenjit
Setting the Memtable allocator on a per CF basis
Hello Cassandra Devs, We are currently trying to optimize our Cassandra system with different workloads. One of our workload is update heavy (very). Currently we are running with a patch that allows the Live Ratio to go below 1.0 (lower bound set to 0.1 now) which gives us a bit better performance in terms of flushes on this particular CF. We then experienced unexpected memory issues which on further inspection seems to be related to the SlabAllocator. What happens is that we allocate a Region of 1MB every couple of seconds (the columns we write in this CF contain serialized session data, can be 100K each), so overwrites are actually done into another Region and these regions are only freed (most of the time) when the Memtable is flushed. We actually added some debug logs and to write about 300MB to disk we created roughly 3000 regions. (3GB of data, some of them might be collected before the flush but probably not much) It would really great if we could use the native allocator only for this CF. Since the SlabAllocator gives us very good results on our other CFs. (we tried running on a patched version with the HeapAllocator set but went OOM almost immediately) I have found this issue in which Jonathan mentions he is ok with adding a configuration option: https://issues.apache.org/jira/browse/CASSANDRA-3073 Unfortunately it seems the issue was closed and nothing was implemented. Would you guys consider adding this option to a future release? SlabAllocator should be the default but in the CF properties the HeapAllocator can be set. If you want I can try to create a patch myself and submit it to you? Kind Regards Joost -- Joost van de Wijgerd Visseringstraat 21B 1051KH Amsterdam +31624111401 joost.van.de.wijgerd@Skype http://www.linkedin.com/in/jwijgerd
Re: bulk load problem
Due to the change in directory structure from ver 1.1, you have to create the directory like /path/to/sstables/Keyspace name/ColumnFamily name and put your sstables. In your case, I think it would be /data/ssTable/tpch/tpch/cf0. And you have to specify that directory as a parameter for sstableloader bin/sstableloader -d 192.168.100.1 /data/ssTable/tpch/tpch/cf0 Yuki On Tuesday, June 26, 2012 at 7:07 PM, James Pirz wrote: Dear all, I am trying to use sstableloader in cassandra 1.1.1, to bulk load some data into a single node cluster. I am running the following command: bin/sstableloader -d 192.168.100.1 /data/ssTable/tpch/tpch/ from another node (other than the node on which cassandra is running), while the data should be loaded into a keyspace named tpch. I made sure that the 2nd node, from which I run sstableloader, have the same copy of cassandra.yaml as the destination node. I have put tpch-cf0-hd-1-Data.db tpch-cf0-hd-1-Index.db under the path, I have passed to sstableloader. But I am getting the following error: Could not retrieve endpoint ranges: Any hint ? Thanks in advance, James
Re: cannot build 1.1.2 from source
Thanks for your response. Yes. I do that every time before I build. On Sun, Jul 8, 2012 at 11:51 AM, aaron morton aa...@thelastpickle.com wrote: Did you try running ant clean first ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/07/2012, at 1:57 PM, Arya Goudarzi wrote: Hi Fellows, I used to be able to build cassandra 1.1 up to 1.1.1 with the same set of procedures by running ant on the same machine, but now the stuff associated with gen-cli-grammar breaks the build. Any advice will be greatly appreciated. -Arya Source: source tarball for 1.1.2 downloaded from one of the mirrors in cassandra.apache.org OS: Ubuntu 10.04 Precise 64bit Ant: Apache Ant(TM) version 1.8.2 compiled on December 3 2011 Maven: Apache Maven 3.0.3 (r1075438; 2011-02-28 17:31:09+) Java: java version 1.6.0_32 Java(TM) SE Runtime Environment (build 1.6.0_32-b05) Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode) Buildfile: /home/arya/workspace/cassandra-1.1.2/build.xml maven-ant-tasks-localrepo: maven-ant-tasks-download: maven-ant-tasks-init: maven-declare-dependencies: maven-ant-tasks-retrieve-build: init-dependencies: [echo] Loading dependency paths from file: /home/arya/workspace/cassandra-1.1.2/build/build-dependencies.xml init: [mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/build/classes/main [mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/build/classes/thrift [mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/build/test/lib [mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/build/test/classes [mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/src/gen-java check-avro-generate: avro-interface-generate-internode: [echo] Generating Avro internode code... avro-generate: build-subprojects: check-gen-cli-grammar: gen-cli-grammar: [echo] Building Grammar /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g [java] warning(209): /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:697:1: Multiple token rules can match input such as '-': IntegerNegativeLiteral, COMMENT [java] [java] As a result, token(s) COMMENT were disabled for that input [java] warning(209): /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: Multiple token rules can match input such as 'I': INCR, INDEX, Identifier [java] [java] As a result, token(s) INDEX,Identifier were disabled for that input [java] warning(209): /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: Multiple token rules can match input such as '0'..'9': IP_ADDRESS, IntegerPositiveLiteral, DoubleLiteral, Identifier [java] [java] As a result, token(s) IntegerPositiveLiteral,DoubleLiteral,Identifier were disabled for that input [java] warning(209): /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: Multiple token rules can match input such as 'T': TRUNCATE, TTL, Identifier [java] [java] As a result, token(s) TTL,Identifier were disabled for that input [java] warning(209): /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: Multiple token rules can match input such as 'A': T__109, API_VERSION, AND, ASSUME, Identifier [java] [java] As a result, token(s) API_VERSION,AND,ASSUME,Identifier were disabled for that input [java] warning(209): /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: Multiple token rules can match input such as 'E': EXIT, Identifier [java] [java] As a result, token(s) Identifier were disabled for that input [java] warning(209): /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: Multiple token rules can match input such as 'L': LIST, LIMIT, Identifier [java] [java] As a result, token(s) LIMIT,Identifier were disabled for that input [java] warning(209): /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: Multiple token rules can match input such as 'B': BY, Identifier [java] [java] As a result, token(s) Identifier were disabled for that input [java] warning(209): /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: Multiple token rules can match input such as 'O': ON, Identifier [java] [java] As a result, token(s) Identifier were disabled for that input [java] warning(209): /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1: Multiple token rules can match input such as 'K': KEYSPACE, KEYSPACES, Identifier [java] [java] As a result, token(s) KEYSPACES,Identifier were disabled for that input [java] warning(209):
Re: bulk load problem
Hi, Thanks Brian for your code and Thanks Yuki. Directory structure was a problem and could correct it with Yuki's guidance. Still the error was same and it was due to wrong Thrift rpc address. After correcting it, bulk loading was successful. On Mon, Jul 9, 2012 at 8:13 PM, Yuki Morishita mor.y...@gmail.com wrote: Due to the change in directory structure from ver 1.1, you have to create the directory like /path/to/sstables/Keyspace name/ColumnFamily name and put your sstables. In your case, I think it would be /data/ssTable/tpch/tpch/cf0. And you have to specify that directory as a parameter for sstableloader bin/sstableloader -d 192.168.100.1 /data/ssTable/tpch/tpch/cf0 Yuki On Tuesday, June 26, 2012 at 7:07 PM, James Pirz wrote: Dear all, I am trying to use sstableloader in cassandra 1.1.1, to bulk load some data into a single node cluster. I am running the following command: bin/sstableloader -d 192.168.100.1 /data/ssTable/tpch/tpch/ from another node (other than the node on which cassandra is running), while the data should be loaded into a keyspace named tpch. I made sure that the 2nd node, from which I run sstableloader, have the same copy of cassandra.yaml as the destination node. I have put tpch-cf0-hd-1-Data.db tpch-cf0-hd-1-Index.db under the path, I have passed to sstableloader. But I am getting the following error: Could not retrieve endpoint ranges: Any hint ? Thanks in advance, James -- Pushpalanka Jayawardhana | Undergraduate | Computer Science and Engineering University of Moratuwa +94779716248 | http://pushpalankajaya.blogspot.com Twitter: http://twitter.com/Pushpalanka | Slideshare: http://www.slideshare.net/Pushpalanka
Re: Java heap space on Cassandra start up version 1.0.10
That's a good point Tyler. I watched Top during this process and even though the heap dump is small, I can see all of my memory resources consumed while Cassandra tries to start. I have the heap dump and can run the Memory Analyzer Tool in Eclipse on it, but I will confess, I'm not sure what I am looking for that would be interesting. Can you help me out with that? Jason On Sat, Jul 7, 2012 at 8:20 PM, Tyler Hobbs ty...@datastax.com wrote: The heap dump is only 47mb, so something strange is going on. Is there anything interesting in the heap dump?
Re: Composite Slice Query returning non-sliced data
Aaron, Let me start from the beginning. 1- I have a ColumnFamily called Rollup15 with below definition: create column family Rollup15 with comparator = 'CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type)' and key_validation_class = UTF8Type and default_validation_class = UTF8Type; 2- Once created, it is empty. Below is the output of CLI: [default@Schema] list Rollup15; Using default limit of 100 0 Row Returned. Elapsed time: 16 msec(s). 3- I use the Code below to insert the Composite Data into Cassandra: public void insertData(String columnFamilyName, String key, String value, int rollupInterval, String... columnSlice) { Composite colKey = new Composite(); colKey.addComponent(rollupInterval, IntegerSerializer.get()); if (columnSlice != null){ for (String colName : columnSlice){ colKey.addComponent(colName, serializer); } } createMutator(keyspace, serializer).addInsertion(key, columnFamilyName, createColumn(colKey, value, new CompositeSerializer(), serializer)).execute(); } 4- After insertion, below is the CLI Output: [default@Schema] list Rollup15; Using default limit of 100 --- RowKey: query1_1337295600 = (column=15:Composite1:Composite2, value=value123, timesta mp=134187983347) 1 Row Returned. Elapsed time: 9 msec(s). So, there is record with 3 Composite Keys (15, Composite1 and Composite2) 5- Now I am doing fetch based on Code Below. I am doing a fetch for column 15:Composite3 which I know it is not there: Composite start = new Composite(); start.addComponent(0, 15, Composite.ComponentEquality.EQUAL); start.addComponent(1, Composite3,Composite.ComponentEquality.EQUAL); Composite finish = new Composite(); finish.addComponent(0, 15, Composite.ComponentEquality.EQUAL); finish.addComponent(1,Composite3+ Character.MAX_VALUE, Composite.ComponentEquality.GREATER_THAN_EQUAL); SliceQueryString, Composite, String sq = HFactory.createSliceQuery(keyspace, StringSerializer.get(), new CompositeSerializer(), StringSerializer.get()); sq.setColumnFamily(Rollup15); sq.setKey(query1_1337295600); sq.setRange(start, finish, false, 1); QueryResultColumnSliceComposite, String result = sq .execute(); ColumnSliceComposite, String orderedRows = result.get(); 6- And I get output for RowKey: query1_1337295600 as (column=15:Composite1:Composite2, value=value123, timesta mp=134187983347) which should not be the case since it does not belong to the 'Composite3' slice. Sunit. On Sun, Jul 8, 2012 at 11:45 AM, aaron morton aa...@thelastpickle.com wrote: Something like: This is how I did the write in CLI and this is what it printed. and then This is how I did the read in the CLI and this is what it printed. It's hard to imagine what data is in cassandra based on code. cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/07/2012, at 1:28 PM, Sunit Randhawa wrote: Aaron, For writing, i am using cli. Below is the piece of code that is reading column names of different types. Composite start = new Composite(); start.addComponent(0, beginTime, Composite.ComponentEquality.EQUAL); if (columns != null){ int colCount =1; for (String colName : columns){ start.addComponent(colCount,colName,Composite.ComponentEquality.EQUAL); colCount++; } } Composite finish = new Composite(); finish.addComponent(0, endTime, Composite.ComponentEquality.EQUAL); if (columns != null){ int colCount =1; for (String colName : columns){ if (colCount == columns.size()) finish.addComponent(colCount,colName+ Character.MAX_VALUE, Composite.ComponentEquality.GREATER_THAN_EQUAL); //Greater_than_equal is meant for any subslices to A:B:C if searched on A:B else finish.addComponent(colCount,colName,Composite.ComponentEquality.EQUAL); colCount++; } } SliceQueryString, Composite, String sq = HFactory.createSliceQuery(keyspace, StringSerializer.get(), new CompositeSerializer(), StringSerializer.get()); sq.setColumnFamily(columnFamilyName); sq.setKey(key); logger.debug(Start:+start+,finish:+finish); sq.setRange(start, finish, false, 1); QueryResultColumnSliceComposite, String result = sq .execute(); ColumnSliceComposite, String
Re: CompositeType support for keynames
It's certainly possible to have a CompositeType key_validation_class. What made you think that you cannot? On Mon, Jul 9, 2012 at 7:48 AM, prasenjit mukherjee prasen@gmail.comwrote: Any reason we dont have CompositeType data structure for key_validation_class ( ref: http://www.datastax.com/docs/0.8/configuration/storage_configuration#key-validation-class ) ? I would like to create row_names in the form username:mmddhhmm ( e.g. joe:201206092312 ). I can still do that key_validation_class=UTF8Type but was wonderign if there is any in-built validation for these kind of row_names. Thanks, -Prasenjit -- Tyler Hobbs DataStax http://datastax.com/