On Mar 25, 2010, at 5:22 PM, Monty Taylor wrote:

> On 03/25/2010 02:47 PM, Eric Day wrote:
>> Of course with binary you waste at least one extra byte since it needs
>> to pack the size too, so doing signed int and managing the last bit
>> would be most efficient.
> 
> Of course, in all fairness - asking people to do the bit swap magic for this 
> from languages which are not C is like telling people to use a Ruby on Rails 
> ActiveResource API from a non-ruby language. (can you tell my frustration of 
> the day?)
> 
> I have no idea how I'd do this sanely in Python, for instance.

Agreed. In fact, using BINARY() for storing data also seems somewhat 
precarious. The database should be reasonably useful - having to build specific 
ways to store data in your application, and have that potential change between 
applications, seems somewhat ugly. That's why being able to add in types is 
something I find exciting - having an IPv4 type for instance, completely solves 
this problem.

That said, to give you an idea of the inefficiencies with BIGINT over INT 
unsigned (at least on MySQL-Maria):

MariaDB [test]> show create table Logger4\G
*************************** 1. row ***************************
       Table: Logger4
Create Table: CREATE TABLE `Logger4` (
  `timestampOccurred` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE 
CURRENT_TIMESTAMP,
  `session` char(32) DEFAULT NULL,
  `host` varchar(255) DEFAULT NULL,
  `sslMode` enum('enabled','disabled') DEFAULT 'enabled',
  `requestURI` varchar(255) DEFAULT NULL,
  `referer` varchar(255) DEFAULT NULL,
  `userAgent` varchar(255) DEFAULT NULL,
  `remoteHost` int(10) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT KEY_BLOCK_SIZE=16
1 row in set (0.00 sec)

MariaDB [test]> show create table Logger8\G
*************************** 1. row ***************************
       Table: Logger8
Create Table: CREATE TABLE `Logger8` (
  `timestampOccurred` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE 
CURRENT_TIMESTAMP,
  `session` char(32) DEFAULT NULL,
  `host` varchar(255) DEFAULT NULL,
  `sslMode` enum('enabled','disabled') DEFAULT 'enabled',
  `requestURI` varchar(255) DEFAULT NULL,
  `referer` varchar(255) DEFAULT NULL,
  `userAgent` varchar(255) DEFAULT NULL,
  `remoteHost` bigint(20) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT KEY_BLOCK_SIZE=16
1 row in set (0.00 sec)

MariaDB [test]> SELECT COUNT(*) FROM Logger4;
+----------+
| COUNT(*) |
+----------+
|  1503687 |
+----------+
1 row in set (5.52 sec)

r...@mcp:/var/lib/mysql/test# ls -lh
total 713M
-rw-rw---- 1 mysql mysql 8.7K Mar 25 21:13 Logger4.frm
-rw-rw---- 1 mysql mysql 352M Mar 25 21:14 Logger4.ibd
-rw-rw---- 1 mysql mysql 8.7K Mar 25 21:13 Logger8.frm
-rw-rw---- 1 mysql mysql 360M Mar 25 21:15 Logger8.ibd
-rw-rw---- 1 mysql mysql   65 Mar 25 21:12 db.opt

Sure, it's only 8MB, but that is a bit surprising considering the only 
difference is that the remoteHost column is an int unsigned or bigint. I 
suspect InnoDB compression would help here, but that is not enabled by default 
in Drizzle (at least at the moment) and other engines may have this problem as 
well.

Granted, the removal of unsigned types was something I wasn't generally a huge 
fan of, so I guess you could say I am a bit biased here but still I think 
people moving over at least from MySQL and storing IPs as INTs are going to be 
in for somewhat of a shock.

Just some thoughts for discussion :)

Tim





_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to