Hello Wes,

@ktou also introduced me your work.
As long as the custom_metadata format to declare the custom datatype
is well defined
in the specification or document somewhere, independent from the
library implementation,
it looks to me sufficient.
Does your UUID example use FixedSizeBinary raw-data type to wrap UUID and put
"arrow_extension_name=uuid" and "arrow_extension_data=uuid-type-unique-code"
on the custrom_metadata of Field "f0", right?
If it is documented somewhere, people can reproduce the custom datatype by their
applications, and other folks can also read the custom datatype.

Thanks,

2019年4月30日(火) 23:47 Wes McKinney <wesmck...@gmail.com>:
>
> hi Kohei,
>
> Since the introduction of arrow::ExtensionType in ARROW-585 [1] we
> have a well-defined method of creating new data types without having
> to manually interact with the custom_metadata Schema information. Can
> you have a look at that and see if it meets your requirements? This
> can be a useful way of extending the Arrow format for your use cases
> while the community may discuss formally adding new logical types to
> the format (or not).
>
> In the unit tests you can see a UUID type I have defined and
> serialized through Arrow's binary protocol machinery
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/extension_type-test.cc
>
> Thanks
> Wes
>
> [1]: 
> https://github.com/apache/arrow/commit/a79cc809883192417920b501e41a0e8b63cd0ad1
>
> On Tue, Apr 30, 2019 at 1:34 AM Kohei KaiGai <kai...@heterodb.com> wrote:
> >
> > Hello,
> >
> > It is an proposition to add new logical types for the Apache Arrow data 
> > format.
> >
> > As Melik-Adamyan said, it is quite easy to convert 5-bytes
> > FixedSizeBinary to PostgreSQL's inet
> > data type by the Arrow_Fdw module (an extension of PostgreSQL
> > responsible to data conversion),
> > however, it is not obvious for readers whether it is network-address
> > or just a bunch of small binary.
> >
> > https://www.postgresql.org/docs/11/sql-importforeignschema.html
> > PostgreSQL has IMPORT FOREIGN SCHEMA command; that allows to define a
> > foreign table
> > according to schema information of the external data source.
> > In case of Arrow_Fdw, we can define a foreign table without manual
> > listing of columns with data
> > types as follows:
> >
> >   IMPORT FOREIGN SCHEMA foo FROM arrow_fdw INTO public
> >   OPTIONS (file '/opt/nvme/foo.arrow');
> >
> > In this case, Schema definition in the 'foo.arrow' can tell PostgreSQL
> > how many columns are
> > defined and its name, data types and so on. However, PostgreSQL may be
> > confusing to convert
> > the FixedSizeBinary (width=5) without any metadata support. It may be
> > 'inet4' data type, and
> > it also may be 'char(5)'.
> >
> > One idea is utilization of custom_metadata field in the Field-node. We
> > may be able to mark it is
> > a network address, not a blob. However, I didn't find out
> > specification of the custom_metadata.
> >
> > I expect network address is widely used for log-data processing area,
> > and not small number of
> > application will support it. If so, it is not too niche requirement
> > for a new logical data type definition
> > in the Apache Arrow data format.
> >
> > Best regards,
> >
> > 2019年4月30日(火) 15:13 Micah Kornfield <emkornfi...@gmail.com>:
> > >
> > > Hi KaiGai Kohei,
> > > Can you clarify if you are looking for advice on modelling these types or
> > > proposing to add new logical types to the Arrow specification?
> > >
> > > Thanks,
> > > Micah
> > >
> > > On Monday, April 29, 2019, Kohei KaiGai <kai...@heterodb.com> wrote:
> > >
> > > > Hello folks,
> > > >
> > > > How about your opinions about network address types support in Apache
> > > > Arrow data format?
> > > > Network address always appears at network logs massively generated by
> > > > any network facilities,
> > > > and it is a significant information when people analyze their backward
> > > > logs.
> > > >
> > > > I'm working on Apache Arrow format mapping on PostgreSQL.
> > > > http://heterodb.github.io/pg-strom/arrow_fdw/
> > > >
> > > > This extension allows to read Arrow files as if PostgreSQL's table
> > > > using foreign table.
> > > > Data types of Arrow shall be mapped to relevant PostgreSQL's data type
> > > > according to the above
> > > > documentation.
> > > >
> > > > https://www.postgresql.org/docs/current/datatype-net-types.html
> > > > PostgreSQL supports some network address types and operators.
> > > > For example, we can put a qualifier like:   WHERE addr <<= inet
> > > > '192.168.1.0/24' , to find out all
> > > > the records in the subnet of '192.168.1.0/24'.
> > > >
> > > > Probably, these three data types are now sufficient for most network
> > > > logs: inet4, inet6 and macaddr.
> > > > * inet4 is 32bit + optional 8bit (for netmask) fixed length array
> > > > * inet6 is 128bit + optional 8bit (for netmask) fixed length array
> > > > * macaddr is 48bit fixed length array.
> > > >
> > > > I don't favor to map the inetX types on flexible length Binary data
> > > > type, because it takes 32bit offset
> > > > to indicate 32 or 40bit value, inefficient so much, even though
> > > > PostgreSQL allows to mix inet4/inet6
> > > > data types in a same column.
> > > >
> > > > Thanks,
> > > > --
> > > > HeteroDB, Inc / The PG-Strom Project
> > > > KaiGai Kohei <kai...@heterodb.com>
> > > >
> >
> >
> >
> > --
> > HeteroDB, Inc / The PG-Strom Project
> > KaiGai Kohei <kai...@heterodb.com>



-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kai...@heterodb.com>

Reply via email to