On Mar 25, 2010, at 5:22 PM, Monty Taylor wrote:
> On 03/25/2010 02:47 PM, Eric Day wrote:
>> Of course with binary you waste at least one extra byte since it needs
>> to pack the size too, so doing signed int and managing the last bit
>> would be most efficient.
>
> Of course, in all fairness - asking people to do the bit swap magic for this
> from languages which are not C is like telling people to use a Ruby on Rails
> ActiveResource API from a non-ruby language. (can you tell my frustration of
> the day?)
>
> I have no idea how I'd do this sanely in Python, for instance.
Agreed. In fact, using BINARY() for storing data also seems somewhat
precarious. The database should be reasonably useful - having to build specific
ways to store data in your application, and have that potential change between
applications, seems somewhat ugly. That's why being able to add in types is
something I find exciting - having an IPv4 type for instance, completely solves
this problem.
That said, to give you an idea of the inefficiencies with BIGINT over INT
unsigned (at least on MySQL-Maria):
MariaDB [test]> show create table Logger4\G
*************************** 1. row ***************************
Table: Logger4
Create Table: CREATE TABLE `Logger4` (
`timestampOccurred` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
CURRENT_TIMESTAMP,
`session` char(32) DEFAULT NULL,
`host` varchar(255) DEFAULT NULL,
`sslMode` enum('enabled','disabled') DEFAULT 'enabled',
`requestURI` varchar(255) DEFAULT NULL,
`referer` varchar(255) DEFAULT NULL,
`userAgent` varchar(255) DEFAULT NULL,
`remoteHost` int(10) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT KEY_BLOCK_SIZE=16
1 row in set (0.00 sec)
MariaDB [test]> show create table Logger8\G
*************************** 1. row ***************************
Table: Logger8
Create Table: CREATE TABLE `Logger8` (
`timestampOccurred` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
CURRENT_TIMESTAMP,
`session` char(32) DEFAULT NULL,
`host` varchar(255) DEFAULT NULL,
`sslMode` enum('enabled','disabled') DEFAULT 'enabled',
`requestURI` varchar(255) DEFAULT NULL,
`referer` varchar(255) DEFAULT NULL,
`userAgent` varchar(255) DEFAULT NULL,
`remoteHost` bigint(20) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT KEY_BLOCK_SIZE=16
1 row in set (0.00 sec)
MariaDB [test]> SELECT COUNT(*) FROM Logger4;
+----------+
| COUNT(*) |
+----------+
| 1503687 |
+----------+
1 row in set (5.52 sec)
r...@mcp:/var/lib/mysql/test# ls -lh
total 713M
-rw-rw---- 1 mysql mysql 8.7K Mar 25 21:13 Logger4.frm
-rw-rw---- 1 mysql mysql 352M Mar 25 21:14 Logger4.ibd
-rw-rw---- 1 mysql mysql 8.7K Mar 25 21:13 Logger8.frm
-rw-rw---- 1 mysql mysql 360M Mar 25 21:15 Logger8.ibd
-rw-rw---- 1 mysql mysql 65 Mar 25 21:12 db.opt
Sure, it's only 8MB, but that is a bit surprising considering the only
difference is that the remoteHost column is an int unsigned or bigint. I
suspect InnoDB compression would help here, but that is not enabled by default
in Drizzle (at least at the moment) and other engines may have this problem as
well.
Granted, the removal of unsigned types was something I wasn't generally a huge
fan of, so I guess you could say I am a bit biased here but still I think
people moving over at least from MySQL and storing IPs as INTs are going to be
in for somewhat of a shock.
Just some thoughts for discussion :)
Tim
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help : https://help.launchpad.net/ListHelp