Frequent secondary index sstable corruption
I'm in the process of migrating data over to cassandra for several of our apps, and a few of the schemas use secondary indexes. Four times in the last couple months I've run into a corrupted sstable belonging to a secondary index, but have never seen this on any other sstables. When it happens, any query against the secondary index just hangs until the node is fixed. It's making me a bit nervous about using secondary indexes in production. This has usually happened after a bulk data import, so I am wondering if the firehose method of dumping initial data into cassandra (write consistency = any) is causing some sort of write concurrency issue when it comes to secondary indexes. Has anyone else experienced this? The cluster is running 1.2.16 on 4x EC2 m1.large instances.
Re: Migration 1.2.14 to 2.0.8 causes Tried to create duplicate hard link at startup
Were you able to solve or work around this problem? On 06/05/2014 11:47 AM, Tom van den Berge wrote: Hi, I'm trying to migrate a development cluster from 1.2.14 to 2.0.8. When starting up 2.0.8, I'm seeing the following error in the logs: INFO 17:40:25,405 Snapshotting drillster, Account to pre-sstablemetamigration ERROR 17:40:25,407 Exception encountered during startup java.lang.RuntimeException: Tried to create duplicate hard link to /Users/tom/cassandra-data/data/drillster/Account/snapshots/pre-sstablemetamigration/drillster-Account-ic-65-Filter.db at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75) at org.apache.cassandra.db.compaction.LegacyLeveledManifest.snapshotWithoutCFS(LegacyLeveledManifest.java:129) at org.apache.cassandra.db.compaction.LegacyLeveledManifest.migrateManifests(LegacyLeveledManifest.java:91) at org.apache.cassandra.db.compaction.LeveledManifest.maybeMigrateManifests(LeveledManifest.java:617) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:274) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) Does anyone have an idea how to solve this? Thanks, Tom
Re: Frequent secondary index sstable corruption
On Tue, Jun 10, 2014 at 7:31 AM, Jeremy Jongsma jer...@barchart.com wrote: I'm in the process of migrating data over to cassandra for several of our apps, and a few of the schemas use secondary indexes. Four times in the last couple months I've run into a corrupted sstable belonging to a secondary index, but have never seen this on any other sstables. When it happens, any query against the secondary index just hangs until the node is fixed. It's making me a bit nervous about using secondary indexes in production. http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201405.mbox/%3CCAEDUwd1i2BwJ-PAFE1qhjQFZ=qz2va_vxwo_jdycms8evkb...@mail.gmail.com%3E I don't know if this particular issue is known and/or fixed upstream, but FWIW/FYI! =Rob
Re: Frequent secondary index sstable corruption
If you've been dropping and recreating tables with the same name, you might be seeing this: https://issues.apache.org/jira/browse/CASSANDRA-6525 On Tue, Jun 10, 2014 at 12:19 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Jun 10, 2014 at 7:31 AM, Jeremy Jongsma jer...@barchart.com wrote: I'm in the process of migrating data over to cassandra for several of our apps, and a few of the schemas use secondary indexes. Four times in the last couple months I've run into a corrupted sstable belonging to a secondary index, but have never seen this on any other sstables. When it happens, any query against the secondary index just hangs until the node is fixed. It's making me a bit nervous about using secondary indexes in production. http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201405.mbox/%3CCAEDUwd1i2BwJ-PAFE1qhjQFZ=qz2va_vxwo_jdycms8evkb...@mail.gmail.com%3E I don't know if this particular issue is known and/or fixed upstream, but FWIW/FYI! =Rob -- Tyler Hobbs DataStax http://datastax.com/
Re: Cannot query secondary index
Honestly, this has been by far my single biggest obstacle with Cassandra for time-based data--cleaning up the old data when the deletion criteria (i.e., date) isn't the primary key. I've asked about a few different approaches, but I haven't really seen any feasible options that can be implemented easily. I've seen the following: 1. Use date-based tables, then drop old tables, ala audit_table_20140610, audit_table_20140609, etc.. But then I run into the issue of having to query every table--I would have to execute queries against every day to get the data, and then merge the data myself. Unless, there's something in the binary driver I'm missing, it doesn't sound like this would be practical. 2. Use a TTL But then I have to basically decide on a value that works for everything and, if it ever turns out I overestimated, I'm basically SOL, because my cluster will be out of space. 3. Maintain a separate index of days to keys, and use this index as the reference for which keys to delete. But then this requires maintaining another index and a relatively manual delete. I can't help but feel that I am just way over-engineering this, or that I'm missing something basic in my data model. Except for the last approach, I can't help but feel that I'm overlooking something obvious. Andrew Of course, Jonathan, I'll do my best! It's an auditing table that, right now, uses a primary key consisting of a combination of a combined partition id of the region and the object id, the date, and the process ID. Each event in our system will create anywhere from 1-20 rows, for example, and multiple parts of the system might be working on the same object ID. So the CF is constantly being appended to, but reads are rare. CREATE TABLE audit ( id bigint, region ascii, date timestamp, pid int, PRIMARY KEY ((id, region), date, pid) ); Data is queried on a specific object ID and region. Optionally, users can restrict their query to a specific date range, which the above data model provides. However, we generate quite a bit of data, and we want a convenient way to get rid of the oldest data. Since our system scales with the time of year, we might get 50GB a day during peak, and 5GB of data off peak. We could pick the safest number--let's say, 30 days--and set the TTL using that. The problem there is that, most of the year, we'll be using a very small percentage of our available space 90% of the year. What I'd like to be able to do is drop old tables as needed--i.e., let's say when we hit 80% load across the cluster (or some such metric that takes the cluster-wide load into account), I want to drop the oldest day's records until we're under 80%. That way, we're always using the maximum amount of space we can, without having to worry about getting to the point where we run out of space cluster-wide. My thoughts are--we could always make the date part of the primary key, but then we'd either a) have to query the entire range of dates, or b) we'd have to force a small date range when querying. What are the penalties? Do you have any other suggestions? On Mon, Jun 9, 2014 at 5:15 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Will you please describe the use case and what you are trying to model. What are some questions/queries that you would like to serve via Cassandra. This will help the community help you a little better. Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Mon, Jun 9, 2014 at 7:51 PM, Redmumba redmu...@gmail.com wrote: I've been trying to work around using date-based tables because I'd like to avoid the overhead. It seems, however, that this is just not going to work. So here's a question--for these date-based tables (i.e., a table per day/week/month/whatever), how are they queried? If I keep 60 days worth of auditing data, for example, I'd need to query all 60 tables--can I do that smoothly? Or do I have to have 60 different select statements? Is there a way for me to run the same query against all the tables? On Mon, Jun 9, 2014 at 3:42 PM, Redmumba redmu...@gmail.com wrote: Ah, so the secondary indices are really secondary against the primary key. That makes sense. I'm beginning to see why the whole date-based table approach is the only one I've been able to find... thanks for the quick responses, guys! On Mon, Jun 9, 2014 at 2:45 PM, Michal Michalski michal.michal...@boxever.com wrote: Secondary indexes internally are just CFs that map the indexed value to a row key which that value belongs to, so you can only query these indexes using =, not , = etc. However, your query does not require index *IF* you provide a row key - you can use or like you did for the date column, as long as you refer to a single row. However, if you don't provide it, it's not going to
Re: How to restart bootstrap after a failed streaming due to Broken Pipe (1.2.16)
On Mon, Jun 9, 2014 at 10:43 PM, Colin Kuo colinkuo...@gmail.com wrote: You can use nodetool repair instead. Repair is able to re-transmit the data which belongs to new node. Repair is not very likely to work in cases where bootstrap doesn't. @OP : you probably will have to tune your phi detector to be more tolerant of nodes pausing. https://issues.apache.org/jira/browse/CASSANDRA-7063 (etc.) =Rob
Adding and removing node procedures
I just wanted to verify the procedures to add and remove nodes in my environment, please feel free to comments or advise. I have 3 node cluster N1, N2, N3 with Vnode configured as (256) on each node. All are in one data center. 1. Procedure to Change node hardware or replace to new node machines (N1, N2 and N3) to (N11, N22 and N31) nodetool -h node2 decommission Bootstrap N21 nodetool repair nodetool -h node1 decommission Bootstrap N11 nodetool repair nodetool -h node3 decommission Bootstrap N31 nodetool repair --- 2. Procedure for changing 3 nodes cluster to 2 nodes cluster (N1, N2 and N3) to (N1, N3) nodetool -h node2 decommission Physically get rid of Node2 --- 3. Procedure for adding new node (N1, N2 and N3) to (N1, N2, N3, N4) Bootstrap N4 nodetool repair --- 4. Procedure to remove dead node/crashed node. (node n2 unable to start) (n1,n2, n3) to (n1,n3) Shutdown N2 if possible nodetool removenode xx_hostid_Of_N2_xx nodetool repair --- 5. Procedure to remove dead node/crashed node and replace with N21. (node n2 unable to start) (n1,n2, n3) to (n1,n3, n21) Shutdown N2 if possible nodetool removenode xx_hostid_Of_N2_xx Bootstrap N21 nodetool repair --- Thanks in advance for pointing any mistake or advise.
StreamException while adding nodes
Hi, I tried to double the size of an existing cluster from 4 to 8 nodes. First I added one node, which joined after 120min successfully. During that time there was no additional load on the cluster. Afterwards I started the other 3 new nodes after each other in order to join the cluster simultaneously. Furthermore I put some write-load on the cluster. After 45min of the process 2 nodes died with following exception. Caused by: org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) at com.google.common.util.concurrent.Futures$4.run(Futures.java:1160) Since I have restarted Cassandra on the failing nodes (8 hours ago), the 3 nodes remain in status JOINING, but there is no data exchange going on any more. Furthermore, nodetool info throws the exception: Exception in thread main java.lang.AssertionError at org.apache.cassandra.locator.TokenMetadata.getTokens(TokenMetadata.java:502) at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2132) which corresponds to isMember returning FALSE. public CollectionToken getTokens(InetAddress endpoint) { assert endpoint != null; assert isMember(endpoint); My questions right now are: - What could have caused the streaming error? - Shouldn't nodes be added while there is some load on the cluster? OS load was between 2 and 6 on a dual core machine. - Would it have been better to add the 3 new nodes one by one, rather than simultaneously? - How should I proceed with the 3 half joined nodes as they are not willing to exchange the missing data? We are using, Cassandra 2.0.7 (vnodes and broadly the default config) and RF 2, with each node having roughly 17 GB of data on it. Thanks for any hints, Phil
Re: Cannot query secondary index
Our approach for this scenario is to run a hadoop job that periodically cleans old entries, but I admit it's far from ideal. Would be nice to have a more native way to perform these kinds of tasks. There's a legend about a compaction strategy that keeps only the N first entries of a partition key, but I don't think it was implemented yet, but if I remember correctly there's a JIRA ticket about it. On Tue, Jun 10, 2014 at 3:39 PM, Redmumba redmu...@gmail.com wrote: Honestly, this has been by far my single biggest obstacle with Cassandra for time-based data--cleaning up the old data when the deletion criteria (i.e., date) isn't the primary key. I've asked about a few different approaches, but I haven't really seen any feasible options that can be implemented easily. I've seen the following: 1. Use date-based tables, then drop old tables, ala audit_table_20140610, audit_table_20140609, etc.. But then I run into the issue of having to query every table--I would have to execute queries against every day to get the data, and then merge the data myself. Unless, there's something in the binary driver I'm missing, it doesn't sound like this would be practical. 2. Use a TTL But then I have to basically decide on a value that works for everything and, if it ever turns out I overestimated, I'm basically SOL, because my cluster will be out of space. 3. Maintain a separate index of days to keys, and use this index as the reference for which keys to delete. But then this requires maintaining another index and a relatively manual delete. I can't help but feel that I am just way over-engineering this, or that I'm missing something basic in my data model. Except for the last approach, I can't help but feel that I'm overlooking something obvious. Andrew Of course, Jonathan, I'll do my best! It's an auditing table that, right now, uses a primary key consisting of a combination of a combined partition id of the region and the object id, the date, and the process ID. Each event in our system will create anywhere from 1-20 rows, for example, and multiple parts of the system might be working on the same object ID. So the CF is constantly being appended to, but reads are rare. CREATE TABLE audit ( id bigint, region ascii, date timestamp, pid int, PRIMARY KEY ((id, region), date, pid) ); Data is queried on a specific object ID and region. Optionally, users can restrict their query to a specific date range, which the above data model provides. However, we generate quite a bit of data, and we want a convenient way to get rid of the oldest data. Since our system scales with the time of year, we might get 50GB a day during peak, and 5GB of data off peak. We could pick the safest number--let's say, 30 days--and set the TTL using that. The problem there is that, most of the year, we'll be using a very small percentage of our available space 90% of the year. What I'd like to be able to do is drop old tables as needed--i.e., let's say when we hit 80% load across the cluster (or some such metric that takes the cluster-wide load into account), I want to drop the oldest day's records until we're under 80%. That way, we're always using the maximum amount of space we can, without having to worry about getting to the point where we run out of space cluster-wide. My thoughts are--we could always make the date part of the primary key, but then we'd either a) have to query the entire range of dates, or b) we'd have to force a small date range when querying. What are the penalties? Do you have any other suggestions? On Mon, Jun 9, 2014 at 5:15 PM, Jonathan Lacefield jlacefi...@datastax.com wrote: Hello, Will you please describe the use case and what you are trying to model. What are some questions/queries that you would like to serve via Cassandra. This will help the community help you a little better. Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Mon, Jun 9, 2014 at 7:51 PM, Redmumba redmu...@gmail.com wrote: I've been trying to work around using date-based tables because I'd like to avoid the overhead. It seems, however, that this is just not going to work. So here's a question--for these date-based tables (i.e., a table per day/week/month/whatever), how are they queried? If I keep 60 days worth of auditing data, for example, I'd need to query all 60 tables--can I do that smoothly? Or do I have to have 60 different select statements? Is there a way for me to run the same query against all the tables? On Mon, Jun 9, 2014 at 3:42 PM, Redmumba redmu...@gmail.com wrote: Ah, so the secondary indices are really secondary against the primary key. That makes sense. I'm beginning to see why the whole date-based table approach is the only one I've been able to
Large number of row keys in query kills cluster
I ran an application today that attempted to fetch 20,000+ unique row keys in one query against a set of completely empty column families. On a 4-node cluster (EC2 m1.large instances) with the recommended memory settings (2 GB heap), every single node immediately ran out of memory and became unresponsive, to the point where I had to kill -9 the cassandra processes. Now clearly this query is not the best idea in the world, but the effects of it are a bit disturbing. What could be going on here? Are there any other query pitfalls I should be aware of that have the potential to explode the entire cluster? -j
Re: Consolidating records and TTL
On Thu, Jun 5, 2014 at 2:38 PM, Charlie Mason charlie@gmail.com wrote: I can't do the initial account insert with a TTL as I can't guarantee when a new value would come along and so replace this account record. However when I insert the new account record, instead of deleting the old one could I reinsert it with a TTL of say 1 month. How would compaction handle this. Would the original record get compacted away after 1 month + the GC Grace period or would it hang around still? Yes, after 1 month + gc_grace, it will be eligible for removal during compaction. Of course, a compaction on that sstable still has to take place before it can be removed. If you're using SizeTieredCompactionStrategy (the default) and have a lot of data, that may take a few more days. -- Tyler Hobbs DataStax http://datastax.com/
Re: Large number of row keys in query kills cluster
Hello Jeremy Basically what you are doing is to ask Cassandra to do a distributed full scan on all the partitions across the cluster, it's normal that the nodes are somehow stressed. How did you make the query? Are you using Thrift or CQL3 API? Please note that there is another way to get all partition keys : SELECT DISTINCT partition_key FROM..., more details here : www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3 I ran an application today that attempted to fetch 20,000+ unique row keys in one query against a set of completely empty column families. On a 4-node cluster (EC2 m1.large instances) with the recommended memory settings (2 GB heap), every single node immediately ran out of memory and became unresponsive, to the point where I had to kill -9 the cassandra processes. Now clearly this query is not the best idea in the world, but the effects of it are a bit disturbing. What could be going on here? Are there any other query pitfalls I should be aware of that have the potential to explode the entire cluster? -j
Re: Large number of row keys in query kills cluster
I didn't explain clearly - I'm not requesting 2 unknown keys (resulting in a full scan), I'm requesting 2 specific rows by key. On Jun 10, 2014 6:02 PM, DuyHai Doan doanduy...@gmail.com wrote: Hello Jeremy Basically what you are doing is to ask Cassandra to do a distributed full scan on all the partitions across the cluster, it's normal that the nodes are somehow stressed. How did you make the query? Are you using Thrift or CQL3 API? Please note that there is another way to get all partition keys : SELECT DISTINCT partition_key FROM..., more details here : www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3 I ran an application today that attempted to fetch 20,000+ unique row keys in one query against a set of completely empty column families. On a 4-node cluster (EC2 m1.large instances) with the recommended memory settings (2 GB heap), every single node immediately ran out of memory and became unresponsive, to the point where I had to kill -9 the cassandra processes. Now clearly this query is not the best idea in the world, but the effects of it are a bit disturbing. What could be going on here? Are there any other query pitfalls I should be aware of that have the potential to explode the entire cluster? -j
Re: Large number of row keys in query kills cluster
Perhaps if you described both the schema and the query in more detail, we could help... e.g. did the query have an IN clause with 2 keys? Or is the key compound? More detail will help. On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma jer...@barchart.com wrote: I didn't explain clearly - I'm not requesting 2 unknown keys (resulting in a full scan), I'm requesting 2 specific rows by key. On Jun 10, 2014 6:02 PM, DuyHai Doan doanduy...@gmail.com wrote: Hello Jeremy Basically what you are doing is to ask Cassandra to do a distributed full scan on all the partitions across the cluster, it's normal that the nodes are somehow stressed. How did you make the query? Are you using Thrift or CQL3 API? Please note that there is another way to get all partition keys : SELECT DISTINCT partition_key FROM..., more details here : www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3 I ran an application today that attempted to fetch 20,000+ unique row keys in one query against a set of completely empty column families. On a 4-node cluster (EC2 m1.large instances) with the recommended memory settings (2 GB heap), every single node immediately ran out of memory and became unresponsive, to the point where I had to kill -9 the cassandra processes. Now clearly this query is not the best idea in the world, but the effects of it are a bit disturbing. What could be going on here? Are there any other query pitfalls I should be aware of that have the potential to explode the entire cluster? -j
Re: StreamException while adding nodes
On Tue, Jun 10, 2014 at 2:21 PM, Philipp Potisk philipp.pot...@geroba.at wrote: First I added one node, which joined after 120min successfully. During that time there was no additional load on the cluster. Afterwards I started the other 3 new nodes after each other in order to join the cluster simultaneously. Bootstrapping multiple nodes at once is now and has always been Not Supported, but is such a common thing for new operators to try that there is now a goal to prevent them from doing it [1]. Cancel those simultaneous bootstraps and do them one at a time, and they'll probably work. [1] https://issues.apache.org/jira/browse/CASSANDRA-7069 =Rob
Re: VPC AWS
Have a look at http://www.tinc-vpn.org/, mesh based and handles multiple gateways for the same network in a graceful manner (so you can run two gateways per region for HA). Also supports NAT traversal if you need to do public-private clusters. We are currently evaluating it for our managed Cassandra in a VPC solution, but we haven’t ever used it in a production environment or with a heavy load, so caveat emptor. As for the snitch… the GPFS is definitely the most flexible. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 10 Jun 2014, at 1:42 am, Ackerman, Mitchell mitchell.acker...@pgi.com wrote: Peter, I too am working on setting up a multi-region VPC Cassandra cluster. Each region is connected to each other via an OpenVPN tunnel, so we can use internal IP addresses for both the seeds and broadcast address. This allows us to use the EC2Snitch (my interpretation of the caveat that this snitch won’t work in a multi-region environment is that it won’t work if you can’t use internal IP addresses, which we can via the VPN tunnels). All the C* nodes find each other, and nodetool (or OpsCenter) shows that we have established a multi-datacenter cluster. Thus far, I’m not happy with the performance of the cluster in such a configuration, but I don’t think that it is related to this configuration, though it could be. Mitchell From: Peter Sanford [mailto:psanf...@retailnext.net] Sent: Monday, June 09, 2014 7:19 AM To: user@cassandra.apache.org Subject: Re: VPC AWS Your general assessments of the limitations of the Ec2 snitches seem to match what we've found. We're currently using the GossipingPropertyFileSnitch in our VPCs. This is also the snitch to use if you ever want to have a DC in EC2 and a DC with another hosting provider. -Peter On Mon, Jun 9, 2014 at 5:48 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi guys, there is a lot of answer, it looks like this subject is interesting a lot of people, so I will end up letting you know how it went for us. For now, we are still doing some tests. Yet I would like to know how we are supposed to configure Cassandra in this environment : - VPC - Multiple datacenters (should be VPCs, one per region, linked through VPN ?) - Cassandra 1.2 We are currently running under EC2MultiRegionSnitch, but with no VPC. Our VPC will have no public interface, so I am not sure how to configure broadcast address or seeds that are supposed to be the public IP of the node. I could use EC2Snitch, but will cross region work properly ? Should I use an other snitch ? Is someone using a similar configuration ? Thanks for information already given guys, we will achieve this ;-). 2014-06-07 0:05 GMT+02:00 Jonathan Haddad j...@jonhaddad.com: This may not help you with the migration, but it may with maintenance management. I just put up a blog post on managing VPC security groups with a tool I open sourced at my previous company. If you're going to have different VPCs (staging / prod), it might help with managing security groups. http://rustyrazorblade.com/2014/06/an-introduction-to-roadhouse/ Semi shameless plug... but relevant. On Thu, Jun 5, 2014 at 12:01 PM, Aiman Parvaiz ai...@shift.com wrote: Cool, thanks again for this. On Thu, Jun 5, 2014 at 11:51 AM, Michael Theroux mthero...@yahoo.com wrote: You can have a ring spread across EC2 and the public subnet of a VPC. That is how we did our migration. In our case, we simply replaced the existing EC2 node with a new instance in the public VPC, restored from a backup taken right before the switch. -Mike From: Aiman Parvaiz ai...@shift.com To: Michael Theroux mthero...@yahoo.com Cc: user@cassandra.apache.org user@cassandra.apache.org Sent: Thursday, June 5, 2014 2:39 PM Subject: Re: VPC AWS Thanks for this info Michael. As far as restoring node in public VPC is concerned I was thinking ( and I might be wrong here) if we can have a ring spread across EC2 and public subnet of a VPC, this way I can simply decommission nodes in Ec2 as I gradually introduce new nodes in public subnet of VPC and I will end up with a ring in public subnet and then migrate them from public to private in a similar way may be. If anyone has any experience/ suggestions with this please share, would really appreciate it. Aiman On Thu, Jun 5, 2014 at 10:37 AM, Michael Theroux mthero...@yahoo.com wrote: The implementation of moving from EC2 to a VPC was a bit of a juggling act. Our motivation was two fold: 1) We were running out of static IP addresses, and it was becoming increasingly difficult in EC2 to design around limiting the number of static IP addresses to the number of public IP addresses EC2 allowed 2) VPC affords us an additional level of security that was desirable. However, we needed to consider the following