Re: Change primary key from int to bigint

2017-01-11 Thread DuyHai Doan
I don't understand why ALTER TYPE was even allowed initially. Apart from
very few corner cases, changing data type on existing data will lead to
disaster in many cases.

On Wed, Jan 11, 2017 at 12:20 PM, Tom van der Woerdt <
tom.vanderwoe...@booking.com> wrote:

> My understanding is that it's safe... but considering "alter type" is
> going to be removed completely (https://issues.apache.org/
> jira/browse/CASSANDRA-12443), maybe not.
>
> As for faster ways to do this: no idea :-(
>
> Tom
>
>
>
> On Wed, Jan 11, 2017 at 12:12 PM, Benjamin Roth 
> wrote:
>
>> But it is safe to change non-primary-key columns from int to varint,
>> right?
>>
>> 2017-01-11 10:09 GMT+01:00 Tom van der Woerdt <
>> tom.vanderwoe...@booking.com>:
>>
>>> Actually, come to think of it, there's a subtle serialization difference
>>> between varint and int that will break token generation (see bottom of
>>> mail). I think it's a bug that Cassandra will allow this, so don't do this
>>> in production.
>>>
>>> You can think of varint encoding as regular bigints with all the leading
>>> zero bytes stripped off. This means the varint decoder will happily decode
>>> the tinyint, smallint, int, and bigint types, but the encoder won't
>>> necessarily re-encode to the same thing. Specifically, any int below
>>> 8388608 will have a different encoding in a varint.
>>>
>>> There's a small performance impact with the varint encoding and decoding
>>> scheme, but likely insignificant for any reasonable use case.
>>>
>>> Tom
>>>
>>>
>>>
>>>
>>>
>>>
>>> cqlsh> select * from foo where id in (1, 128, 256, 65535, 65536,
>>> 16777215, 16777216, 2147483647 <%28214%29%20748-3647>);
>>>
>>>  id | value
>>> +---
>>>   1 |  test
>>> 128 |  test
>>> 256 |  test
>>>   65535 |  test
>>>   65536 |  test
>>>16777215 |  test
>>>16777216 |  test
>>>  2147483647 <%28214%29%20748-3647> |  test
>>>
>>> (8 rows)
>>> cqlsh> alter table foo alter id TYPE varint;
>>> cqlsh> select * from foo where id in (1, 128, 256, 65535, 65536,
>>> 16777215, 16777216, 2147483647 <%28214%29%20748-3647>);
>>>
>>>  id | value
>>> +---
>>>16777215 |  test
>>>16777216 |  test
>>>  2147483647 <%28214%29%20748-3647> |  test
>>>
>>> (3 rows)
>>> cqlsh> select * from foo;
>>>
>>>  id | value
>>> +---
>>> 128 |  test
>>>16777216 |  test
>>>   1 |  test
>>>  2147483647 <%28214%29%20748-3647> |  test
>>>16777215 |  test
>>> 256 |  test
>>>   65535 |  test
>>>   65536 |  test
>>>
>>>
>>>
>>>
>>> On Wed, Jan 11, 2017 at 9:54 AM, Benjamin Roth 
>>> wrote:
>>>
 Few! You saved my life, thanks!

 For my understanding:
 When creating a new table, is bigint or varint a better choice for
 storing (up to) 64bit ints? Is there a difference in performance?

 2017-01-11 9:39 GMT+01:00 Tom van der Woerdt <
 tom.vanderwoe...@booking.com>:

> Hi Benjamin,
>
> bigint and int have incompatible serialization types, so that won't
> work. However, changing to 'varint' will work fine.
>
> Hope that helps.
>
> Tom
>
>
>
> On Wed, Jan 11, 2017 at 9:21 AM, Benjamin Roth <
> benjamin.r...@jaumo.com> wrote:
>
>> Hi there,
>>
>> Does anyone know if there is a hack to change a "int" to a "bigint"
>> in a primary key?
>> I recognized very late, I took the wrong type and our production DB
>> already contains billions of records :(
>> Is there maybe a hack for it, because int and bigint are similar
>> types or does the SSTable serialization and maybe the token generation
>> require the tables to be completely reread+rewritten?
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161
>> 304880-1 <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


 --
 Benjamin Roth
 Prokurist

 Jaumo GmbH · www.jaumo.com
 Wehrstraße 46 · 73035 Göppingen · Germany
 Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
 <+49%207161%203048801>
 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

>>>
>>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


Re: Change primary key from int to bigint

2017-01-11 Thread Tom van der Woerdt
My understanding is that it's safe... but considering "alter type" is going
to be removed completely (
https://issues.apache.org/jira/browse/CASSANDRA-12443), maybe not.

As for faster ways to do this: no idea :-(

Tom


On Wed, Jan 11, 2017 at 12:12 PM, Benjamin Roth 
wrote:

> But it is safe to change non-primary-key columns from int to varint, right?
>
> 2017-01-11 10:09 GMT+01:00 Tom van der Woerdt <
> tom.vanderwoe...@booking.com>:
>
>> Actually, come to think of it, there's a subtle serialization difference
>> between varint and int that will break token generation (see bottom of
>> mail). I think it's a bug that Cassandra will allow this, so don't do this
>> in production.
>>
>> You can think of varint encoding as regular bigints with all the leading
>> zero bytes stripped off. This means the varint decoder will happily decode
>> the tinyint, smallint, int, and bigint types, but the encoder won't
>> necessarily re-encode to the same thing. Specifically, any int below
>> 8388608 will have a different encoding in a varint.
>>
>> There's a small performance impact with the varint encoding and decoding
>> scheme, but likely insignificant for any reasonable use case.
>>
>> Tom
>>
>>
>>
>>
>>
>>
>> cqlsh> select * from foo where id in (1, 128, 256, 65535, 65536,
>> 16777215, 16777216, 2147483647 <%28214%29%20748-3647>);
>>
>>  id | value
>> +---
>>   1 |  test
>> 128 |  test
>> 256 |  test
>>   65535 |  test
>>   65536 |  test
>>16777215 |  test
>>16777216 |  test
>>  2147483647 <%28214%29%20748-3647> |  test
>>
>> (8 rows)
>> cqlsh> alter table foo alter id TYPE varint;
>> cqlsh> select * from foo where id in (1, 128, 256, 65535, 65536,
>> 16777215, 16777216, 2147483647 <%28214%29%20748-3647>);
>>
>>  id | value
>> +---
>>16777215 |  test
>>16777216 |  test
>>  2147483647 <%28214%29%20748-3647> |  test
>>
>> (3 rows)
>> cqlsh> select * from foo;
>>
>>  id | value
>> +---
>> 128 |  test
>>16777216 |  test
>>   1 |  test
>>  2147483647 <%28214%29%20748-3647> |  test
>>16777215 |  test
>> 256 |  test
>>   65535 |  test
>>   65536 |  test
>>
>>
>>
>>
>> On Wed, Jan 11, 2017 at 9:54 AM, Benjamin Roth 
>> wrote:
>>
>>> Few! You saved my life, thanks!
>>>
>>> For my understanding:
>>> When creating a new table, is bigint or varint a better choice for
>>> storing (up to) 64bit ints? Is there a difference in performance?
>>>
>>> 2017-01-11 9:39 GMT+01:00 Tom van der Woerdt <
>>> tom.vanderwoe...@booking.com>:
>>>
 Hi Benjamin,

 bigint and int have incompatible serialization types, so that won't
 work. However, changing to 'varint' will work fine.

 Hope that helps.

 Tom



 On Wed, Jan 11, 2017 at 9:21 AM, Benjamin Roth  wrote:

> Hi there,
>
> Does anyone know if there is a hack to change a "int" to a "bigint" in
> a primary key?
> I recognized very late, I took the wrong type and our production DB
> already contains billions of records :(
> Is there maybe a hack for it, because int and bigint are similar types
> or does the SSTable serialization and maybe the token generation require
> the tables to be completely reread+rewritten?
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: Change primary key from int to bigint

2017-01-11 Thread Benjamin Roth
But it is safe to change non-primary-key columns from int to varint, right?

2017-01-11 10:09 GMT+01:00 Tom van der Woerdt 
:

> Actually, come to think of it, there's a subtle serialization difference
> between varint and int that will break token generation (see bottom of
> mail). I think it's a bug that Cassandra will allow this, so don't do this
> in production.
>
> You can think of varint encoding as regular bigints with all the leading
> zero bytes stripped off. This means the varint decoder will happily decode
> the tinyint, smallint, int, and bigint types, but the encoder won't
> necessarily re-encode to the same thing. Specifically, any int below
> 8388608 will have a different encoding in a varint.
>
> There's a small performance impact with the varint encoding and decoding
> scheme, but likely insignificant for any reasonable use case.
>
> Tom
>
>
>
>
>
>
> cqlsh> select * from foo where id in (1, 128, 256, 65535, 65536, 16777215,
> 16777216, 2147483647);
>
>  id | value
> +---
>   1 |  test
> 128 |  test
> 256 |  test
>   65535 |  test
>   65536 |  test
>16777215 |  test
>16777216 |  test
>  2147483647 |  test
>
> (8 rows)
> cqlsh> alter table foo alter id TYPE varint;
> cqlsh> select * from foo where id in (1, 128, 256, 65535, 65536, 16777215,
> 16777216, 2147483647);
>
>  id | value
> +---
>16777215 |  test
>16777216 |  test
>  2147483647 |  test
>
> (3 rows)
> cqlsh> select * from foo;
>
>  id | value
> +---
> 128 |  test
>16777216 |  test
>   1 |  test
>  2147483647 |  test
>16777215 |  test
> 256 |  test
>   65535 |  test
>   65536 |  test
>
>
>
>
> On Wed, Jan 11, 2017 at 9:54 AM, Benjamin Roth 
> wrote:
>
>> Few! You saved my life, thanks!
>>
>> For my understanding:
>> When creating a new table, is bigint or varint a better choice for
>> storing (up to) 64bit ints? Is there a difference in performance?
>>
>> 2017-01-11 9:39 GMT+01:00 Tom van der Woerdt <
>> tom.vanderwoe...@booking.com>:
>>
>>> Hi Benjamin,
>>>
>>> bigint and int have incompatible serialization types, so that won't
>>> work. However, changing to 'varint' will work fine.
>>>
>>> Hope that helps.
>>>
>>> Tom
>>>
>>>
>>>
>>> On Wed, Jan 11, 2017 at 9:21 AM, Benjamin Roth 
>>> wrote:
>>>
 Hi there,

 Does anyone know if there is a hack to change a "int" to a "bigint" in
 a primary key?
 I recognized very late, I took the wrong type and our production DB
 already contains billions of records :(
 Is there maybe a hack for it, because int and bigint are similar types
 or does the SSTable serialization and maybe the token generation require
 the tables to be completely reread+rewritten?

 --
 Benjamin Roth
 Prokurist

 Jaumo GmbH · www.jaumo.com
 Wehrstraße 46 · 73035 Göppingen · Germany
 Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
 <+49%207161%203048801>
 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

>>>
>>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Change primary key from int to bigint

2017-01-11 Thread Benjamin Roth
Wow okay! Fortunately I did not change the types, yet!

So there is no other way than reading the whole table and re-insert all
data?
Is there a faster way than doing all this with CQL? Like importing existing
SSTables directly into a new CF with the new column types?

2017-01-11 10:09 GMT+01:00 Tom van der Woerdt 
:

> Actually, come to think of it, there's a subtle serialization difference
> between varint and int that will break token generation (see bottom of
> mail). I think it's a bug that Cassandra will allow this, so don't do this
> in production.
>
> You can think of varint encoding as regular bigints with all the leading
> zero bytes stripped off. This means the varint decoder will happily decode
> the tinyint, smallint, int, and bigint types, but the encoder won't
> necessarily re-encode to the same thing. Specifically, any int below
> 8388608 will have a different encoding in a varint.
>
> There's a small performance impact with the varint encoding and decoding
> scheme, but likely insignificant for any reasonable use case.
>
> Tom
>
>
>
>
>
>
> cqlsh> select * from foo where id in (1, 128, 256, 65535, 65536, 16777215,
> 16777216, 2147483647);
>
>  id | value
> +---
>   1 |  test
> 128 |  test
> 256 |  test
>   65535 |  test
>   65536 |  test
>16777215 |  test
>16777216 |  test
>  2147483647 |  test
>
> (8 rows)
> cqlsh> alter table foo alter id TYPE varint;
> cqlsh> select * from foo where id in (1, 128, 256, 65535, 65536, 16777215,
> 16777216, 2147483647);
>
>  id | value
> +---
>16777215 |  test
>16777216 |  test
>  2147483647 |  test
>
> (3 rows)
> cqlsh> select * from foo;
>
>  id | value
> +---
> 128 |  test
>16777216 |  test
>   1 |  test
>  2147483647 |  test
>16777215 |  test
> 256 |  test
>   65535 |  test
>   65536 |  test
>
>
>
>
> On Wed, Jan 11, 2017 at 9:54 AM, Benjamin Roth 
> wrote:
>
>> Few! You saved my life, thanks!
>>
>> For my understanding:
>> When creating a new table, is bigint or varint a better choice for
>> storing (up to) 64bit ints? Is there a difference in performance?
>>
>> 2017-01-11 9:39 GMT+01:00 Tom van der Woerdt <
>> tom.vanderwoe...@booking.com>:
>>
>>> Hi Benjamin,
>>>
>>> bigint and int have incompatible serialization types, so that won't
>>> work. However, changing to 'varint' will work fine.
>>>
>>> Hope that helps.
>>>
>>> Tom
>>>
>>>
>>>
>>> On Wed, Jan 11, 2017 at 9:21 AM, Benjamin Roth 
>>> wrote:
>>>
 Hi there,

 Does anyone know if there is a hack to change a "int" to a "bigint" in
 a primary key?
 I recognized very late, I took the wrong type and our production DB
 already contains billions of records :(
 Is there maybe a hack for it, because int and bigint are similar types
 or does the SSTable serialization and maybe the token generation require
 the tables to be completely reread+rewritten?

 --
 Benjamin Roth
 Prokurist

 Jaumo GmbH · www.jaumo.com
 Wehrstraße 46 · 73035 Göppingen · Germany
 Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
 <+49%207161%203048801>
 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

>>>
>>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Change primary key from int to bigint

2017-01-11 Thread Tom van der Woerdt
Actually, come to think of it, there's a subtle serialization difference
between varint and int that will break token generation (see bottom of
mail). I think it's a bug that Cassandra will allow this, so don't do this
in production.

You can think of varint encoding as regular bigints with all the leading
zero bytes stripped off. This means the varint decoder will happily decode
the tinyint, smallint, int, and bigint types, but the encoder won't
necessarily re-encode to the same thing. Specifically, any int below
8388608 will have a different encoding in a varint.

There's a small performance impact with the varint encoding and decoding
scheme, but likely insignificant for any reasonable use case.

Tom






cqlsh> select * from foo where id in (1, 128, 256, 65535, 65536, 16777215,
16777216, 2147483647);

 id | value
+---
  1 |  test
128 |  test
256 |  test
  65535 |  test
  65536 |  test
   16777215 |  test
   16777216 |  test
 2147483647 |  test

(8 rows)
cqlsh> alter table foo alter id TYPE varint;
cqlsh> select * from foo where id in (1, 128, 256, 65535, 65536, 16777215,
16777216, 2147483647);

 id | value
+---
   16777215 |  test
   16777216 |  test
 2147483647 |  test

(3 rows)
cqlsh> select * from foo;

 id | value
+---
128 |  test
   16777216 |  test
  1 |  test
 2147483647 |  test
   16777215 |  test
256 |  test
  65535 |  test
  65536 |  test



On Wed, Jan 11, 2017 at 9:54 AM, Benjamin Roth 
wrote:

> Few! You saved my life, thanks!
>
> For my understanding:
> When creating a new table, is bigint or varint a better choice for storing
> (up to) 64bit ints? Is there a difference in performance?
>
> 2017-01-11 9:39 GMT+01:00 Tom van der Woerdt  >:
>
>> Hi Benjamin,
>>
>> bigint and int have incompatible serialization types, so that won't work.
>> However, changing to 'varint' will work fine.
>>
>> Hope that helps.
>>
>> Tom
>>
>>
>>
>> On Wed, Jan 11, 2017 at 9:21 AM, Benjamin Roth 
>> wrote:
>>
>>> Hi there,
>>>
>>> Does anyone know if there is a hack to change a "int" to a "bigint" in a
>>> primary key?
>>> I recognized very late, I took the wrong type and our production DB
>>> already contains billions of records :(
>>> Is there maybe a hack for it, because int and bigint are similar types
>>> or does the SSTable serialization and maybe the token generation require
>>> the tables to be completely reread+rewritten?
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: Change primary key from int to bigint

2017-01-11 Thread Benjamin Roth
Few! You saved my life, thanks!

For my understanding:
When creating a new table, is bigint or varint a better choice for storing
(up to) 64bit ints? Is there a difference in performance?

2017-01-11 9:39 GMT+01:00 Tom van der Woerdt :

> Hi Benjamin,
>
> bigint and int have incompatible serialization types, so that won't work.
> However, changing to 'varint' will work fine.
>
> Hope that helps.
>
> Tom
>
>
>
> On Wed, Jan 11, 2017 at 9:21 AM, Benjamin Roth 
> wrote:
>
>> Hi there,
>>
>> Does anyone know if there is a hack to change a "int" to a "bigint" in a
>> primary key?
>> I recognized very late, I took the wrong type and our production DB
>> already contains billions of records :(
>> Is there maybe a hack for it, because int and bigint are similar types or
>> does the SSTable serialization and maybe the token generation require the
>> tables to be completely reread+rewritten?
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Change primary key from int to bigint

2017-01-11 Thread Tom van der Woerdt
Hi Benjamin,

bigint and int have incompatible serialization types, so that won't work.
However, changing to 'varint' will work fine.

Hope that helps.

Tom


On Wed, Jan 11, 2017 at 9:21 AM, Benjamin Roth 
wrote:

> Hi there,
>
> Does anyone know if there is a hack to change a "int" to a "bigint" in a
> primary key?
> I recognized very late, I took the wrong type and our production DB
> already contains billions of records :(
> Is there maybe a hack for it, because int and bigint are similar types or
> does the SSTable serialization and maybe the token generation require the
> tables to be completely reread+rewritten?
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>