Hello folks,

How about your opinions about network address types support in Apache
Arrow data format?
Network address always appears at network logs massively generated by
any network facilities,
and it is a significant information when people analyze their backward logs.

I'm working on Apache Arrow format mapping on PostgreSQL.
http://heterodb.github.io/pg-strom/arrow_fdw/

This extension allows to read Arrow files as if PostgreSQL's table
using foreign table.
Data types of Arrow shall be mapped to relevant PostgreSQL's data type
according to the above
documentation.

https://www.postgresql.org/docs/current/datatype-net-types.html
PostgreSQL supports some network address types and operators.
For example, we can put a qualifier like:   WHERE addr <<= inet
'192.168.1.0/24' , to find out all
the records in the subnet of '192.168.1.0/24'.

Probably, these three data types are now sufficient for most network
logs: inet4, inet6 and macaddr.
* inet4 is 32bit + optional 8bit (for netmask) fixed length array
* inet6 is 128bit + optional 8bit (for netmask) fixed length array
* macaddr is 48bit fixed length array.

I don't favor to map the inetX types on flexible length Binary data
type, because it takes 32bit offset
to indicate 32 or 40bit value, inefficient so much, even though
PostgreSQL allows to mix inet4/inet6
data types in a same column.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kai...@heterodb.com>

Reply via email to