[
https://issues.apache.org/jira/browse/CASSANDRA-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096845#comment-13096845
]
Radim Kolar commented on CASSANDRA-3031:
----------------------------------------
Typical hector program looks like this:
import me.prettyprint.cassandra.serializers.IntegerSerializer
ColumnFamilyTemplate<Integer, String> template = new
ThriftColumnFamilyTemplate<Integer, String>
(HFactory.createKeyspace(keyspace, c, new
WeakWritePolicy()), "sipdb", IntegerSerializer.get(), StringSerializer.get() );
Using this will send always 4 bytes long integers to Cassandra because of
IntegerSerializer is using fixed instead of variable size. If you serialize 0 -
it will send 00 00 00 00. Same with reading, it cant read variable sized
integers back. Current Cassandra-cli and some of other client libraries has
oposite problem, it cant work with fixed size int4 integers - writing to column
using different client library corrupts data.
There is hector serializer able to do variable size serialization (BigInteger)
but it is way slower. in hector community conclusion is that it is not worth of
saving one or two bytes because you will need to declare your variables as
variable sized integers in Java, which is too slow and unpractical.
I also like fixed sized integers more and they should be default for INT cql
type because you know what value size you will read back from database.
application can have subtle bugs if somebody inserts large value into column
and it silently overflows in client during reading from database. Defaulting to
variable sized type would work if majority of client applications can process
variable sized integers by default. Only python is doing auto conversion from
fixed size int - to variable sized long on overflows.
existing CQL CF create scripts are not a big issue, most ppl are creating
schemas via cassandra-cli scripts. CQL is not widely used and it will not be
used much until cassaandra-cli can do CQL. Only tool for working with CQL is
not user friendly cqlsh. Cassandra administrators needs to read version upgrade
document anyway, it is enough to document this change as part of 0.8 to 1.0
upgrade procedure.
ppl coming from SQL land (mysql, mssql, db2) expect to have INT/INTEGER types 4
bytes long and BIGINT 8 bytes long. variable integer type should be named in
CQL like DECIMAL or NUMERIC (mysql, db2, mssql) or NUMBER (oracle).
> Add 4 byte integer type
> -----------------------
>
> Key: CASSANDRA-3031
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3031
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.8.4
> Environment: any
> Reporter: Radim Kolar
> Priority: Minor
> Labels: hector, lhf
> Fix For: 1.0
>
> Attachments: apache-cassandra-0.8.4-SNAPSHOT.jar, src.diff, test.diff
>
>
> Cassandra currently lacks support for 4byte fixed size integer data type.
> Java API Hector and C libcassandra likes to serialize integers as 4 bytes in
> network order. Problem is that you cant use cassandra-cli to manipulate
> stored rows. Compatibility with other applications using api following
> cassandra integer encoding standard is problematic too.
> Because adding new datatype/validator is fairly simple I recommend to add
> int4 data type. Compatibility with hector is important because it is most
> used Java cassandra api and lot of applications are using it.
> This problem was discussed several times already
> http://comments.gmane.org/gmane.comp.db.hector.user/2125
> https://issues.apache.org/jira/browse/CASSANDRA-2585
> It would be nice to have compatibility with cassandra-cli and other
> applications without rewriting hector apps.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira