[ 
https://issues.apache.org/jira/browse/CASSANDRA-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096845#comment-13096845
 ] 

Radim Kolar commented on CASSANDRA-3031:
----------------------------------------

Typical hector program looks like this:

import me.prettyprint.cassandra.serializers.IntegerSerializer

ColumnFamilyTemplate<Integer, String> template = new 
ThriftColumnFamilyTemplate<Integer, String> 
                                (HFactory.createKeyspace(keyspace, c, new 
WeakWritePolicy()), "sipdb", IntegerSerializer.get(), StringSerializer.get() );

Using this will send always 4 bytes long integers to Cassandra because of 
IntegerSerializer is using fixed instead of variable size. If you serialize 0 - 
it will send 00 00 00 00. Same with reading, it cant read variable sized 
integers back. Current Cassandra-cli and some of other client libraries has 
oposite problem, it cant work with fixed size int4 integers - writing to column 
using different client library corrupts data.

There is hector serializer able to do variable size serialization (BigInteger) 
but it is way slower. in hector community conclusion is that it is not worth of 
saving one or two bytes because you will need to declare your variables as 
variable sized integers in Java, which is too slow and unpractical.

I also like fixed sized integers more and they should be default for INT cql 
type because you know what value size you will read back from database. 
application can have subtle bugs if somebody inserts large value into column 
and it silently overflows in client during reading from database. Defaulting to 
variable sized type would work if majority of client applications can process 
variable sized integers by default. Only python is doing auto conversion from 
fixed size int - to variable sized long on overflows.

existing CQL CF create scripts are not a big issue, most ppl are creating 
schemas via cassandra-cli scripts. CQL is not widely used and it will not be 
used much until cassaandra-cli can do CQL. Only tool for working with CQL is 
not user friendly cqlsh. Cassandra administrators needs to read version upgrade 
document anyway, it is enough to document this change as part of 0.8 to 1.0 
upgrade procedure.

ppl coming from SQL land (mysql, mssql, db2) expect to have INT/INTEGER types 4 
bytes long and BIGINT 8 bytes long. variable integer type should be named in 
CQL like DECIMAL or NUMERIC (mysql, db2, mssql) or NUMBER (oracle).

> Add 4 byte integer type
> -----------------------
>
>                 Key: CASSANDRA-3031
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3031
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.4
>         Environment: any
>            Reporter: Radim Kolar
>            Priority: Minor
>              Labels: hector, lhf
>             Fix For: 1.0
>
>         Attachments: apache-cassandra-0.8.4-SNAPSHOT.jar, src.diff, test.diff
>
>
> Cassandra currently lacks support for 4byte fixed size integer data type. 
> Java API Hector and C libcassandra likes to serialize integers as 4 bytes in 
> network order. Problem is that you cant use cassandra-cli to manipulate 
> stored rows. Compatibility with other applications using api following 
> cassandra integer encoding standard is problematic too.
> Because adding new datatype/validator is fairly simple I recommend to add 
> int4 data type. Compatibility with hector is important because it is most 
> used Java cassandra api and lot of applications are using it.
> This problem was discussed several times already 
> http://comments.gmane.org/gmane.comp.db.hector.user/2125
> https://issues.apache.org/jira/browse/CASSANDRA-2585
> It would be nice to have compatibility with cassandra-cli and other 
> applications without rewriting hector apps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to