RE: [EXTERNAL] n00b q re UPDATE v. INSERT in CQL

2019-10-25 Thread Durity, Sean R
Everything in Cassandra is an insert. So, an update and an insert are 
functionally equivalent. An update doesn't go update the existing data on disk; 
it is a new write of the columns involved. So, the difference in your scenario 
is that with the "targeted" update, you are writing less of the columns again.

So, instead of inserting 10 values (for example), you are inserting 3 (pk1, 
pk2, and col1). This would mean less disk space used for your data cleanup. 
Once Cassandra runs its compaction across your data (with a simplifying 
assumption that all data gets compacted), there would be no disk space 
difference for the final result.

I would do the updates, if the size/scope of the data involved is significant.


Sean Durity – Staff Systems Engineer, Cassandra

-Original Message-
From: James A. Robinson 
Sent: Friday, October 25, 2019 10:49 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] n00b q re UPDATE v. INSERT in CQL

Hi folks,

I'm working on a clean-up task for some bad data in a cassandra db.
The bad data in this case are values with mixed case that will need to
be lowercased.  In some tables the value that needs to be changed is a
primary key, in other cases it is not.

From the reading I've done, the situations where I need to change a
primary key column to lowercase will mean I need to perform an INSERT
of the entire row using the new primary key values merged with the old
non-primary-key values, followed by a DELETE of the old primary key
row.

My question is, on a table where I need to update a column that isn't
primary key, should I perform a limited UPDATE in that situation like
I would in SQL:

UPDATE ks.table SET col1 = ? WHERE pk1 = ? AND pk2 = ?

or will there be any downsides to that over an INSERT where I specify
all columns?

INSERT INTO ks.table (pk1, pk2, col1, col2, ...) VALUES (?,?,?,?, ...)

In SQL I'd never question just using the update but my impression
reading the blogosphere is that Cassandra has subtleties that I might
not be grasping when it comes to UPDATE v. INSERT behavior...

Jim

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


n00b q re UPDATE v. INSERT in CQL

2019-10-25 Thread James A. Robinson
Hi folks,

I'm working on a clean-up task for some bad data in a cassandra db.
The bad data in this case are values with mixed case that will need to
be lowercased.  In some tables the value that needs to be changed is a
primary key, in other cases it is not.

>From the reading I've done, the situations where I need to change a
primary key column to lowercase will mean I need to perform an INSERT
of the entire row using the new primary key values merged with the old
non-primary-key values, followed by a DELETE of the old primary key
row.

My question is, on a table where I need to update a column that isn't
primary key, should I perform a limited UPDATE in that situation like
I would in SQL:

UPDATE ks.table SET col1 = ? WHERE pk1 = ? AND pk2 = ?

or will there be any downsides to that over an INSERT where I specify
all columns?

INSERT INTO ks.table (pk1, pk2, col1, col2, ...) VALUES (?,?,?,?, ...)

In SQL I'd never question just using the update but my impression
reading the blogosphere is that Cassandra has subtleties that I might
not be grasping when it comes to UPDATE v. INSERT behavior...

Jim

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org