Re: Restore a table with dropped columns to a new cluster fails

2020-07-24 Thread Mitch Gitman
Fabulous tip. Thanks, Sean. I will definitely check out dsbulk.

Great to see it's a Cassandra-general tool and not just limited to DataStax
Enterprise.

On Fri, Jul 24, 2020 at 12:58 PM Durity, Sean R 
wrote:

> I would use dsbulk to unload and load. Then the schemas don’t really
> matter. You define which fields in the resulting file are loaded into which
> columns. You also won’t have the limitations and slowness of COPY TO/FROM.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Mitch Gitman 
> *Sent:* Friday, July 24, 2020 2:22 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Restore a table with dropped columns to a new
> cluster fails
>
>
>
> I'm reviving this thread because I'm looking for a non-hacky way to
> migrate data from one cluster to another using nodetool snapshot and
> sstableloader without having to preserve dropped columns in the new schema.
> In my view, that's just cruft and confusion that keeps building.
>
> The best idea I can come up with is to do the following in the source
> cluster:
>
>1. Use the cqlsh COPY FROM command to export the data in the table.
>2. Drop the table.
>3. Re-create the table.
>4. Use the cqlsh COPY TO command to import the data into the new
>incarnation of the table.
>
>
> This approach is predicated on two assumptions:
>
>- The re-created table has no knowledge of the history of the old
>table by the same name.
>- The amount of data in the table doesn't exceed what the COPY command
>can handle.
>
>
> If the dropped columns exist in the table in an environment where there's
> a lot of data, then we'd have to use some other mechanism to capture and
> reload the data.
>
> If you see something wrong about this approach or you have a better way to
> do it, I'd be glad to hear from you.
>
>
>
> On Tue, Feb 19, 2019 at 11:31 AM Jeff Jirsa  wrote:
>
> You can also manually add the dropped column to the appropriate table to
> eliminate the issue. Has to be done by a human, a new cluster would have no
> way of learning about a dropped column, and the missing metadata cannot be
> inferred.
>
>
>
>
>
> On Tue, Feb 19, 2019 at 10:58 AM Elliott Sims 
> wrote:
>
> When a snapshot is taken, it includes a "schema.cql" file.  That should be
> sufficient to restore whatever you need to restore.  I'd argue that neither
> automatically resurrecting a dropped table nor silently failing to restore
> it is a good behavior, so it's not unreasonable to have the user re-create
> the table then choose if they want to re-drop it.
>
>
>
> On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger  wrote:
>
> Hi,
>
>
>
> I would like to bring this issue to your attention.
>
>
>
> Link to the ticket:
>
> https://issues.apache.org/jira/browse/CASSANDRA-14336 [issues.apache.org]
> 
>
>
>
> Basically if a table contains dropped columns and you try to restore a
> snapshot to a new cluster, that will fail because of an error like
> "java.lang.RuntimeException: Unknown column XXX during deserialization”.
>
>
>
> I feel this is quite serious problem for backup and restore functionality
> of Cassandra. You cannot restore a backup to a new cluster if columns have
> been dropped.
>
>
>
> There have been other similar tickets that have been apparently closed but
> based on my test with 3.11.4, the issue still persists.
>
>
>
> Best Regards,
>
> Hannu Kröger
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


[RELEASE] Apache Cassandra 3.11.7 released

2020-07-24 Thread Mick Semb Wever
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.11.7.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.
 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:
 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.11 series. As always, please
pay attention to the release notes[2] and Let us know[3] if you were to
encounter any problem.

Enjoy!

[1]: CHANGES.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-3.11.7
[2]: NEWS.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-3.11.7
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.17 released

2020-07-24 Thread Mick Semb Wever
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.17.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.
 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:
 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay attention to the release notes[2] and Let us know[3] if you were to
encounter any problem.

Enjoy!

[1]: CHANGES.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-2.2.17
[2]: NEWS.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-2.2.17
[3]: https://issues.apache.org/jira/browse/CASSANDRA


RE: Restore a table with dropped columns to a new cluster fails

2020-07-24 Thread Durity, Sean R
I would use dsbulk to unload and load. Then the schemas don’t really matter. 
You define which fields in the resulting file are loaded into which columns. 
You also won’t have the limitations and slowness of COPY TO/FROM.


Sean Durity

From: Mitch Gitman 
Sent: Friday, July 24, 2020 2:22 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Restore a table with dropped columns to a new cluster 
fails

I'm reviving this thread because I'm looking for a non-hacky way to migrate 
data from one cluster to another using nodetool snapshot and sstableloader 
without having to preserve dropped columns in the new schema. In my view, 
that's just cruft and confusion that keeps building.

The best idea I can come up with is to do the following in the source cluster:

  1.  Use the cqlsh COPY FROM command to export the data in the table.
  2.  Drop the table.
  3.  Re-create the table.
  4.  Use the cqlsh COPY TO command to import the data into the new incarnation 
of the table.

This approach is predicated on two assumptions:

  *   The re-created table has no knowledge of the history of the old table by 
the same name.
  *   The amount of data in the table doesn't exceed what the COPY command can 
handle.

If the dropped columns exist in the table in an environment where there's a lot 
of data, then we'd have to use some other mechanism to capture and reload the 
data.

If you see something wrong about this approach or you have a better way to do 
it, I'd be glad to hear from you.

On Tue, Feb 19, 2019 at 11:31 AM Jeff Jirsa 
mailto:jji...@gmail.com>> wrote:
You can also manually add the dropped column to the appropriate table to 
eliminate the issue. Has to be done by a human, a new cluster would have no way 
of learning about a dropped column, and the missing metadata cannot be inferred.


On Tue, Feb 19, 2019 at 10:58 AM Elliott Sims 
mailto:elli...@backblaze.com>> wrote:
When a snapshot is taken, it includes a "schema.cql" file.  That should be 
sufficient to restore whatever you need to restore.  I'd argue that neither 
automatically resurrecting a dropped table nor silently failing to restore it 
is a good behavior, so it's not unreasonable to have the user re-create the 
table then choose if they want to re-drop it.

On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger 
mailto:hkro...@gmail.com>> wrote:
Hi,

I would like to bring this issue to your attention.

Link to the ticket:
https://issues.apache.org/jira/browse/CASSANDRA-14336 
[issues.apache.org]

Basically if a table contains dropped columns and you try to restore a snapshot 
to a new cluster, that will fail because of an error like 
"java.lang.RuntimeException: Unknown column XXX during deserialization”.

I feel this is quite serious problem for backup and restore functionality of 
Cassandra. You cannot restore a backup to a new cluster if columns have been 
dropped.

There have been other similar tickets that have been apparently closed but 
based on my test with 3.11.4, the issue still persists.

Best Regards,
Hannu Kröger



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Restore a table with dropped columns to a new cluster fails

2020-07-24 Thread Mitch Gitman
I'm reviving this thread because I'm looking for a non-hacky way to migrate
data from one cluster to another using nodetool snapshot and sstableloader
without having to preserve dropped columns in the new schema. In my view,
that's just cruft and confusion that keeps building.

The best idea I can come up with is to do the following in the source
cluster:

   1. Use the cqlsh COPY FROM command to export the data in the table.
   2. Drop the table.
   3. Re-create the table.
   4. Use the cqlsh COPY TO command to import the data into the new
   incarnation of the table.


This approach is predicated on two assumptions:

   - The re-created table has no knowledge of the history of the old table
   by the same name.
   - The amount of data in the table doesn't exceed what the COPY command
   can handle.


If the dropped columns exist in the table in an environment where there's a
lot of data, then we'd have to use some other mechanism to capture and
reload the data.

If you see something wrong about this approach or you have a better way to
do it, I'd be glad to hear from you.

On Tue, Feb 19, 2019 at 11:31 AM Jeff Jirsa  wrote:

> You can also manually add the dropped column to the appropriate table to
> eliminate the issue. Has to be done by a human, a new cluster would have no
> way of learning about a dropped column, and the missing metadata cannot be
> inferred.
>
>
> On Tue, Feb 19, 2019 at 10:58 AM Elliott Sims 
> wrote:
>
>> When a snapshot is taken, it includes a "schema.cql" file.  That should
>> be sufficient to restore whatever you need to restore.  I'd argue that
>> neither automatically resurrecting a dropped table nor silently failing to
>> restore it is a good behavior, so it's not unreasonable to have the user
>> re-create the table then choose if they want to re-drop it.
>>
>>
>> On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger  wrote:
>>
>>> Hi,
>>>
>>> I would like to bring this issue to your attention.
>>>
>>> Link to the ticket:
>>> https://issues.apache.org/jira/browse/CASSANDRA-14336
>>>
>>> Basically if a table contains dropped columns and you try to restore a
>>> snapshot to a new cluster, that will fail because of an error like
>>> "java.lang.RuntimeException: Unknown column XXX during deserialization”.
>>>
>>> I feel this is quite serious problem for backup and restore
>>> functionality of Cassandra. You cannot restore a backup to a new cluster if
>>> columns have been dropped.
>>>
>>> There have been other similar tickets that have been apparently closed
>>> but based on my test with 3.11.4, the issue still persists.
>>>
>>> Best Regards,
>>> Hannu Kröger
>>>
>>


Re: [RELEASE] Apache Cassandra 4.0-beta1 released

2020-07-24 Thread Mick Semb Wever
> This version is a beta release[1] on the 4.0 series. As always, please
> pay attention to the release notes[2] and let us know[3] if you were
> to encounter any problem.



A quick followup note to both user and dev groups.

Our Beta release guidelines¹ states that there will be no further API
changes leading up to 4.0 GA.
But we do currently have in place three planned exceptions to this, found in
the following tickets:

- CASSANDRA-15299 –  "CASSANDRA-13304 follow-up: improve checksumming and
compression in protocol v5-beta"
- CASSANDRA-15234 – Standardise config and JVM parameters
- CASSANDRA-13701 – Lower default num_tokens


The API changes to these are minimal, and part of the reason these were
allowed to slip into the beta phase.
For example…
* CASSANDRA-15299 only affects those that are using the not yet stabilised
v5 native protocol.
* CASSANDRA-15234 will introduce cleaner, easier to use, cassandra.yaml
settings, but all existing yaml setting names will still work.
* CASSANDRA-13701 will change the default num_tokens setting to 16 (from
256), a change that will only impact provisioning of new clusters (because
existing clusters must configure any new nodes to use the existing
num_tokens value).

All three tickets have fixVersion still set to `4.0-alpha` because of this
situation.


References:
 [1] https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle