Re: Replication to second data center with different number of nodes
I typically use a # a lot lower than 256, usually less than 20 for num_tokens as a larger number has historically had a dramatic impact on query performance. — Colin Clark co...@clark.ws +1 612-859-6129 skype colin.p.clark On Mar 28, 2015, at 3:46 PM, Eric Stevens migh...@gmail.com wrote: If you're curious about how Cassandra knows how to replicate data in the remote DC, it's the same as in the local DC, replication is independent in each, and you can even set a different replication strategy per keyspace per datacenter. Nodes in each DC take up num_tokens positions on a ring, each partition key is mapped to a position on that ring, and whomever owns that part of the ring is the primary for that data. Then (oversimplified) r-1 adjacent nodes become replicas for that same data. On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles charles.sibb...@bskyb.com mailto:charles.sibb...@bskyb.com wrote: http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens So go with a default 256, and leave initial token empty: num_tokens: 256 # initial_token: Cassandra will always give each node the same number of tokens, the only time you might want to distribute this is if your instances are of different sizing/capability which is also a bad scenario. From: Björn Hachmann bjoern.hachm...@metrigo.de mailto:bjoern.hachm...@metrigo.de Reply-To: user@cassandra.apache.org mailto:user@cassandra.apache.org user@cassandra.apache.org mailto:user@cassandra.apache.org Date: Friday, 27 March 2015 12:11 To: user user@cassandra.apache.org mailto:user@cassandra.apache.org Subject: Re: Replication to second data center with different number of nodes 2015-03-27 11:58 GMT+01:00 Sibbald, Charles charles.sibb...@bskyb.com mailto:charles.sibb...@bskyb.com: Cassandra’s Vnodes config Thank you. Yes, we are using vnodes! The num_token parameter controls the number of vnodes assigned to a specific node. Might be I am seeing problems where are none. Let me rephrase my question: How does Cassandra know it has to replicate 1/3 of all keys to each single node in the second DC? I can see two ways: 1. It has to be configured explicitly. 2. It is derived from the number of nodes available in the data center at the time `nodetool rebuild` is started. Kind regards Björn Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of Sky plc and Sky International AG and are used under licence. Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD. smime.p7s Description: S/MIME cryptographic signature
Re: ('Unable to complete the operation against any hosts', {})
Rahul, Can you try reducing the batch size to 1000? Also what is the write consistency level? Thanks, Dhanasekaran On 29-Mar-2015, at 12:30 am, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: Hi All, awaiting any response.. please help regards: rahul On Fri, Mar 27, 2015 at 5:54 PM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: Hi All, We are using cassandra version 2.1.2 with cqlsh 5.0.1 (cluster of three nodes with rf 2) I need to load around 40 million records into a table of cassandra db. I have created batch of 1 million ( batch of 1 records also gives the same error) in csv format. when I use copy command to import I got this error, which is causing problem. cqlsh:mesh_glusr copy glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area) from 'gl_a' with delimiter = '\t' and QUOTE = ''; Processed 36000 rows; Write: 1769.07 rows/s Record has the wrong number of fields (9 instead of 10). Aborting import at record #36769. Previously-inserted values still present. 36669 rows imported in 20.571 seconds. cqlsh:mesh_glusr copy glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area) from 'gl_a' with delimiter = '\t' and QUOTE = ''; Processed 185000 rows; Write: 1800.91 rows/s Record has the wrong number of fields (9 instead of 10). Aborting import at record #185607. Previously-inserted values still present. 185507 rows imported in 1 minute and 43.428 seconds. [cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cqlsh use mesh_glusr ; cqlsh:mesh_glusr copy glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area) from 'gl_a1' with delimiter = '\t' and QUOTE = ''; Processed 373000 rows; Write: 1741.23 rows/s ('Unable to complete the operation against any hosts', {}) Aborting import at record #373269. Previously-inserted values still present. When we remove already inserted records from file and on again starting the command for rest data, it inserts few more records and gives the same error without any specific. please help if any one have some idea about this error. Regards: Rahul Bhardwaj Follow IndiaMART.com for latest updates on this and more:Mobile Channel: Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki Kaam Yahin Banta Hai!!!
Re: ('Unable to complete the operation against any hosts', {})
Hi All, awaiting any response.. please help regards: rahul On Fri, Mar 27, 2015 at 5:54 PM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: Hi All, We are using cassandra version 2.1.2 with cqlsh 5.0.1 (cluster of three nodes with rf 2) I need to load around 40 million records into a table of cassandra db. I have created batch of 1 million ( batch of 1 records also gives the same error) in csv format. when I use copy command to import I got this error, which is causing problem. cqlsh:mesh_glusr copy glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area) from 'gl_a' with delimiter = '\t' and QUOTE = ''; Processed 36000 rows; Write: 1769.07 rows/s Record has the wrong number of fields (9 instead of 10). Aborting import at record #36769. Previously-inserted values still present. 36669 rows imported in 20.571 seconds. cqlsh:mesh_glusr copy glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area) from 'gl_a' with delimiter = '\t' and QUOTE = ''; Processed 185000 rows; Write: 1800.91 rows/s Record has the wrong number of fields (9 instead of 10). Aborting import at record #185607. Previously-inserted values still present. 185507 rows imported in 1 minute and 43.428 seconds. [cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cqlsh use mesh_glusr ; cqlsh:mesh_glusr copy glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area) from 'gl_a1' with delimiter = '\t' and QUOTE = ''; Processed 373000 rows; Write: 1741.23 rows/s ('Unable to complete the operation against any hosts', {}) Aborting import at record #373269. Previously-inserted values still present. When we remove already inserted records from file and on again starting the command for rest data, it inserts few more records and gives the same error without any specific. please help if any one have some idea about this error. Regards: Rahul Bhardwaj -- Follow IndiaMART.com http://www.indiamart.com for latest updates on this and more: https://plus.google.com/+indiamart https://www.facebook.com/IndiaMART https://twitter.com/IndiaMART Mobile Channel: https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641mt=8 https://play.google.com/store/apps/details?id=com.indiamart.m http://m.indiamart.com/ https://www.youtube.com/watch?v=DzORNbeSXN8list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1index=2 Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki Kaam Yahin Banta Hai https://www.youtube.com/watch?v=Q9fZ5ILY3w8feature=youtu.be!!!
Re: High latencies for simple queries
cqlsh runs on the internal cassandra python drivers: cassandra-pylib and cqlshlib. I would not recommend using them at all (nothing wrong with them, they are just not built with external users in mind). I have never used python-driver in anger so I can't comment on whether it is genuinely slower than the internal C* python driver, but this might be a question for python-driver folk. On 28 March 2015 at 00:34, Artur Siekielski a...@vhex.net wrote: On 03/28/2015 12:13 AM, Ben Bromhead wrote: One other thing to keep in mind / check is that doing these tests locally the cassandra driver will connect using the network stack, whereas postgres supports local connections over a unix domain socket (this is also enabled by default). Unix domain sockets are significantly faster than tcp as you don't have a network stack to traverse. I think any driver using libpq will attempt to use the domain socket when connecting locally. Good catch. I assured that psycopg2 connects through a TCP socket and the numbers increased by about 20%, but it still is an order of magnitude faster than Cassandra. But I'm going to hazard a guess something else is going on with the Cassandra connection as I'm able to get 0.5ms queries locally and that's even with trace turned on. Using python-driver? -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
Re: ('Unable to complete the operation against any hosts', {})
Don't use batches for this. Use a lot of async queries. https://lostechies.com/ryansvihla/2014/08/28/cassandra-batch-loading-without-the-batch-keyword/ Jon On Mar 27, 2015, at 5:24 AM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: Hi All, We are using cassandra version 2.1.2 with cqlsh 5.0.1 (cluster of three nodes with rf 2) I need to load around 40 million records into a table of cassandra db. I have created batch of 1 million ( batch of 1 records also gives the same error) in csv format. when I use copy command to import I got this error, which is causing problem. cqlsh:mesh_glusr copy glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area) from 'gl_a' with delimiter = '\t' and QUOTE = ''; Processed 36000 rows; Write: 1769.07 rows/s Record has the wrong number of fields (9 instead of 10). Aborting import at record #36769. Previously-inserted values still present. 36669 rows imported in 20.571 seconds. cqlsh:mesh_glusr copy glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area) from 'gl_a' with delimiter = '\t' and QUOTE = ''; Processed 185000 rows; Write: 1800.91 rows/s Record has the wrong number of fields (9 instead of 10). Aborting import at record #185607. Previously-inserted values still present. 185507 rows imported in 1 minute and 43.428 seconds. [cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3] Use HELP for help. cqlsh use mesh_glusr ; cqlsh:mesh_glusr copy glusr_usr1(glusr_usr_id,glusr_usr_usrname,glusr_usr_pass,glusr_usr_membersince,glusr_usr_designation,glusr_usr_url,glusr_usr_modid,fk_gl_city_id,fk_gl_state_id,glusr_usr_ph2_area) from 'gl_a1' with delimiter = '\t' and QUOTE = ''; Processed 373000 rows; Write: 1741.23 rows/s ('Unable to complete the operation against any hosts', {}) Aborting import at record #373269. Previously-inserted values still present. When we remove already inserted records from file and on again starting the command for rest data, it inserts few more records and gives the same error without any specific. please help if any one have some idea about this error. Regards: Rahul Bhardwaj Follow IndiaMART.com for latest updates on this and more:Mobile Channel: Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki Kaam Yahin Banta Hai!!!
Re: Replication to second data center with different number of nodes
If you're curious about how Cassandra knows how to replicate data in the remote DC, it's the same as in the local DC, replication is independent in each, and you can even set a different replication strategy per keyspace per datacenter. Nodes in each DC take up num_tokens positions on a ring, each partition key is mapped to a position on that ring, and whomever owns that part of the ring is the primary for that data. Then (oversimplified) r-1 adjacent nodes become replicas for that same data. On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles charles.sibb...@bskyb.com wrote: http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens So go with a default 256, and leave initial token empty: num_tokens: 256 # initial_token: Cassandra will always give each node the same number of tokens, the only time you might want to distribute this is if your instances are of different sizing/capability which is also a bad scenario. From: Björn Hachmann bjoern.hachm...@metrigo.de Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Friday, 27 March 2015 12:11 To: user user@cassandra.apache.org Subject: Re: Replication to second data center with different number of nodes 2015-03-27 11:58 GMT+01:00 Sibbald, Charles charles.sibb...@bskyb.com: Cassandra’s Vnodes config Thank you. Yes, we are using vnodes! The num_token parameter controls the number of vnodes assigned to a specific node. Might be I am seeing problems where are none. Let me rephrase my question: How does Cassandra know it has to replicate 1/3 of all keys to each single node in the second DC? I can see two ways: 1. It has to be configured explicitly. 2. It is derived from the number of nodes available in the data center at the time `nodetool rebuild` is started. Kind regards Björn Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trademarks of Sky plc and Sky International AG and are used under licence. Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD.
Re: Arbitrary nested tree hierarchy data model
Ben Bromhead sent an email to me directly and expressed an interest in seeing some of my queries. I may as well post them for everyone. Here are my queries for the part of my code that reads and cleans up browse trees. @NamedCqlQueries({ @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_CHECK_TREE_EXISTS, query = SELECT tree FROM tree WHERE tree = :tree, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_QUORUM ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_ALL_TREES, query = SELECT tree, atime, pub, rhpath FROM tree, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_QUORUM ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_TREE, query = SELECT tree FROM tree, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_QUORUM ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_NODE, query = SELECT hpath, tree FROM node, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_INDEX_PAGE, query = SELECT page, tree FROM path_by_page, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_INDEX_PUB, query = SELECT distinct tree, bucket FROM path_by_pub, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_ALL_DOC_BROWSE_INDEX_CHILD, query = SELECT distinct phpath, bucket, tree FROM path_by_parent, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_TREE, query = DELETE FROM tree WHERE tree IN :tree, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_NODE, query = DELETE FROM node WHERE hpath IN :hpath AND tree = :tree, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_INDEX_PAGE, query = DELETE FROM path_by_page WHERE page IN :page AND tree = :tree, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_INDEX_PUB, query = DELETE FROM path_by_pub WHERE tree = :tree AND bucket IN :bucket, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_CLEAN_DOC_BROWSE_INDEX_CHILD, query = DELETE FROM path_by_parent WHERE phpath = :phpath AND bucket = :bucket AND tree IN :tree, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_MAX_ORDINAL, query = SELECT pord FROM path_by_pub WHERE tree = :tree AND bucket = :bucket ORDER BY pord DESC, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_QUORUM ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_PAGE, query = SELECT page, tree, ord, hpath FROM path_by_page WHERE page = :page AND tree = :tree, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_PAGE_ALL_TREES, query = SELECT page, tree, ord, hpath FROM path_by_page WHERE page = :page, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_NODE, query = SELECT tree, hpath, node, ccount FROM node WHERE hpath = :hpath AND tree = :tree, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_NODE_ALL_TREES, query = SELECT tree, hpath, node, ccount FROM node WHERE hpath = :hpath, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_TREE_FOR_HASHPATH, query = SELECT tree, node FROM node WHERE hpath = :hpath, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_CHILDREN, query = SELECT hpath FROM path_by_parent WHERE phpath = :phpath AND bucket = :bucket AND tree = :tree AND ord = :ord, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_ALL_CHILDREN, query = SELECT hpath FROM path_by_parent WHERE phpath = :phpath AND bucket = :bucket AND tree = :tree, keyspace = KeyspaceFamilyImpl.BROWSE, consistencyLevel = ConsistencyLevel.LOCAL_ONE ), @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_GET_NEIGHBORS_NEXT, query = SELECT hpath FROM path_by_pub WHERE tree = :tree AND bucket = :bucket AND pord :pord ORDER BY pord, keyspace = KeyspaceFamilyImpl.BROWSE,
Re: Arbitrary nested tree hierarchy data model
Fascinating. Both the mysql front and and this delightful inverted search solution. Your creativity makes me wonder what other delights your query solutions might expose!! sent from my mobile Daemeon C.M. Reiydelle USA 415.501.0198 London +44.0.20.8144.9872 On Mar 27, 2015 7:08 PM, Robert Wille rwi...@fold3.com wrote: Okay, this is going to be a pretty long post, but I think its an interesting data model, and hopefully someone will find it worth going through. First, I think it will be easier to understand the modeling choices I made if you see the end product. Go to http://www.fold3.com/browse.php#249|hzUkLqDmI. What you see looks like one big tree, but actually is a combination of trees spliced together. There is one tree in a relational database that forms what I call the top-level browse. The top-level browse is used to navigate through categories until you arrive at a publication. When you drill down into a publication, you are then viewing data stored in Cassandra. The link provided above points to the root of a publication (in this case, maps from the Civil War), so to the left is top-level browse coming from MySQL, and to the right is the Cassandra browse. Each publication has an independent tree in Cassandra, with all trees stored in the same set of tables (I do not dynamically create tables for each publication — I personally think that’s a bad practice). We currently have 458 publications, and collectively they have about half a billion nodes and consume about 400 GB (RF=3). My trees are immutable. When there are changes to a publication (e.g. adding new documents), it is very difficult to know what changes need to be made to the tree to edit it in-place. Also, it would be impossible to maintain navigational consistency while a tree is in process of being restructured. So, when a publication changes, I create a completely new tree. Once the new tree is built, I change a reference to point to the new tree. I have a process that nightly pages through the tables and deletes records that belong to obsolete trees. This process takes about five hours. If it were much longer than that, I would probably run it weekly. My database has quite a bit of churn, which is fairly unusual for a Cassandra-based application. Most nights build two or three trees, generally resulting in a few tens of millions of new records and a slightly fewer number of deletions. Size-tiered compaction is a bad choice for churn, so I use leveled compaction. Most publications are at most a few million nodes, and generally build in less than 20 minutes. Since any modeling exercise requires knowing the queries, I should describe that before getting into the model. Here are the features I need to support. For browsing the tree, I need to be able to get the children of a node (paginated), the siblings of a node (also paginated), and the ancestors of a node. The leaves of each tree are images and form a filmstrip. You can use the filmstrip to navigate through all the images in a publication in the tree’s natural order. If you go to my browse page and keep drilling down, you’ll eventually get to an image. The filmstrip appears at the bottom of the image viewer. Before I discuss the schema, I should discuss a couple of other non-obvious things that are relevant to the data model. One very common operation is to retrieve a node and all of its ancestors in order to display a path. Denormalization would suggest that I store the data for each node, along with that of all of its ancestors. That would mean that in my biggest tree, I would store the root node 180 million times. I didn’t consider that kind of bloat to be acceptable, so I do not denormalize ancestors. I also wanted to retrieve a node and its ancestors in constant time, rather than O(n) as would be typical for tree traversal. In order to accomplish this, I use a pretty unique idea for a node's primary key. I create a hash from information in the node, and then append it to the hash of its parent. So, the primary key is really a path. When I need to retrieve a node and its ancestors, I tokenize the path and issue queries in parallel to get all the nodes in the ancestry at the same time. In keeping with this pattern of not denormalizing, my auxiliary tables do not have node data in them, but instead provide a means of getting hash paths, which I then tokenize and make parallel requests with. Most requests that use an auxiliary table can generally just make a query to the auxiliary table to get the hash path, and then retrieve the node and its ancestors from the node table. Three or fewer trips to Cassandra are sufficient for all my API’s. Without further ado, here’s my schema (with commentary): CREATE TABLE tree ( tree INT, pub INT, rhpath VARCHAR, atime TIMESTAMP, ccount INT, ncount INT, PRIMARY KEY (tree) ) WITH gc_grace_seconds = 864000; This table maintains the references to the root nodes
Re: Arbitrary nested tree hierarchy data model
that's neat, thanks for sharing. sounds like the solution is partly inspired by merkle tree to make lookup fast and easy. peter On Fri, Mar 27, 2015 at 10:07 PM, Robert Wille rwi...@fold3.com wrote: Okay, this is going to be a pretty long post, but I think its an interesting data model, and hopefully someone will find it worth going through. First, I think it will be easier to understand the modeling choices I made if you see the end product. Go to http://www.fold3.com/browse.php#249|hzUkLqDmI. What you see looks like one big tree, but actually is a combination of trees spliced together. There is one tree in a relational database that forms what I call the top-level browse. The top-level browse is used to navigate through categories until you arrive at a publication. When you drill down into a publication, you are then viewing data stored in Cassandra. The link provided above points to the root of a publication (in this case, maps from the Civil War), so to the left is top-level browse coming from MySQL, and to the right is the Cassandra browse. Each publication has an independent tree in Cassandra, with all trees stored in the same set of tables (I do not dynamically create tables for each publication — I personally think that’s a bad practice). We currently have 458 publications, and collectively they have about half a billion nodes and consume about 400 GB (RF=3). My trees are immutable. When there are changes to a publication (e.g. adding new documents), it is very difficult to know what changes need to be made to the tree to edit it in-place. Also, it would be impossible to maintain navigational consistency while a tree is in process of being restructured. So, when a publication changes, I create a completely new tree. Once the new tree is built, I change a reference to point to the new tree. I have a process that nightly pages through the tables and deletes records that belong to obsolete trees. This process takes about five hours. If it were much longer than that, I would probably run it weekly. My database has quite a bit of churn, which is fairly unusual for a Cassandra-based application. Most nights build two or three trees, generally resulting in a few tens of millions of new records and a slightly fewer number of deletions. Size-tiered compaction is a bad choice for churn, so I use leveled compaction. Most publications are at most a few million nodes, and generally build in less than 20 minutes. Since any modeling exercise requires knowing the queries, I should describe that before getting into the model. Here are the features I need to support. For browsing the tree, I need to be able to get the children of a node (paginated), the siblings of a node (also paginated), and the ancestors of a node. The leaves of each tree are images and form a filmstrip. You can use the filmstrip to navigate through all the images in a publication in the tree’s natural order. If you go to my browse page and keep drilling down, you’ll eventually get to an image. The filmstrip appears at the bottom of the image viewer. Before I discuss the schema, I should discuss a couple of other non-obvious things that are relevant to the data model. One very common operation is to retrieve a node and all of its ancestors in order to display a path. Denormalization would suggest that I store the data for each node, along with that of all of its ancestors. That would mean that in my biggest tree, I would store the root node 180 million times. I didn’t consider that kind of bloat to be acceptable, so I do not denormalize ancestors. I also wanted to retrieve a node and its ancestors in constant time, rather than O(n) as would be typical for tree traversal. In order to accomplish this, I use a pretty unique idea for a node's primary key. I create a hash from information in the node, and then append it to the hash of its parent. So, the primary key is really a path. When I need to retrieve a node and its ancestors, I tokenize the path and issue queries in parallel to get all the nodes in the ancestry at the same time. In keeping with this pattern of not denormalizing, my auxiliary tables do not have node data in them, but instead provide a means of getting hash paths, which I then tokenize and make parallel requests with. Most requests that use an auxiliary table can generally just make a query to the auxiliary table to get the hash path, and then retrieve the node and its ancestors from the node table. Three or fewer trips to Cassandra are sufficient for all my API’s. Without further ado, here’s my schema (with commentary): CREATE TABLE tree ( tree INT, pub INT, rhpath VARCHAR, atime TIMESTAMP, ccount INT, ncount INT, PRIMARY KEY (tree) ) WITH gc_grace_seconds = 864000; This table maintains the references to the root nodes for each tree. pub is the primary key for the publication table in my relational database. There is usually just one
Re: High latencies for simple queries
On 03/28/2015 12:13 AM, Ben Bromhead wrote: One other thing to keep in mind / check is that doing these tests locally the cassandra driver will connect using the network stack, whereas postgres supports local connections over a unix domain socket (this is also enabled by default). Unix domain sockets are significantly faster than tcp as you don't have a network stack to traverse. I think any driver using libpq will attempt to use the domain socket when connecting locally. Good catch. I assured that psycopg2 connects through a TCP socket and the numbers increased by about 20%, but it still is an order of magnitude faster than Cassandra. But I'm going to hazard a guess something else is going on with the Cassandra connection as I'm able to get 0.5ms queries locally and that's even with trace turned on. Using python-driver?