Re: [HACKERS] binary representation of datatypes

2008-10-22 Thread Jeroen Vermeulen

Matthieu Imbert wrote:


scenario 1 - parse the textual representation of all results of requests to the 
database and convert textual timestamps to a binary
format that i choose among those ones (number of microseconds since 2000-01-01, 
or a structure similar to pg_tm (but with
microsecond precision), or a time-format similar to one defined in rfc1305, or 
something else)

or

scenario 2 - directly use pgsql binary timestamp format. I think the latter is 
far more efficient. I'm new to postgresql, but from
what i understand, here are the conversions involved in both scenarios (hopping 
that my ascii art won't be garbled by your mail
clients ;-) :


scenario 1:
.-.  .--.  .-.  .--.  .--.  
.--.  .-.
|timestamp|  |pgsql |  |timestamp|  |pgsql |  |timestamp |  |my 
   |  |my   |
|storage  |-|internal  |-|storage  |-|network   |-|as|-|timestamp 
|-|timestamp|
|in   |  |to|  |in   |  |to|  |textual   |  
|conversion|  |format   |
|database |  |network   |  |network  |  |textual   |  |representation|  
|routines  |  | |
|backend  |  |conversion|  | |  |conversion|  |  |  |   
   |  | |
| |  |function  |  | |  |function  |  |  |  |   
   |  | |
'-'  '--'  '-'  '--'  '--'  
'--'  '-'


I think this scenario has two boxes too many.  Why would the backend 
convert to network representation before converting to text?



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] binary representation of datatypes

2008-10-22 Thread Matthieu Imbert
Jeroen Vermeulen wrote:
 Matthieu Imbert wrote:
 
 scenario 1 - parse the textual representation of all results of
 requests to the database and convert textual timestamps to a binary
 format that i choose among those ones (number of microseconds since
 2000-01-01, or a structure similar to pg_tm (but with
 microsecond precision), or a time-format similar to one defined in
 rfc1305, or something else)

 or

 scenario 2 - directly use pgsql binary timestamp format. I think the
 latter is far more efficient. I'm new to postgresql, but from
 what i understand, here are the conversions involved in both scenarios
 (hopping that my ascii art won't be garbled by your mail
 clients ;-) :


 scenario 1:
 .-.  .--.  .-.  .--.  .--.  
 .--.  .-.
 |timestamp|  |pgsql |  |timestamp|  |pgsql |  |timestamp |  |my  
   |  |my   |
 |storage  |-|internal  |-|storage  |-|network   |-|as
 |-|timestamp |-|timestamp|
 |in   |  |to|  |in   |  |to|  |textual   |  
 |conversion|  |format   |
 |database |  |network   |  |network  |  |textual   |  |representation|  
 |routines  |  | |
 |backend  |  |conversion|  | |  |conversion|  |  |  |
   |  | |
 | |  |function  |  | |  |function  |  |  |  |
   |  | |
 '-'  '--'  '-'  '--'  '--'  
 '--'  '-'
 
 I think this scenario has two boxes too many.  Why would the backend
 convert to network representation before converting to text?
 
 
 Jeroen
 

You mean that when results are asked in textual representation (the default), 
data is sent on network directly as text?

-- 
Matthieu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] binary representation of datatypes

2008-10-22 Thread Heikki Linnakangas

Matthieu Imbert wrote:

You mean that when results are asked in textual representation (the default), 
data is sent on network directly as text?


Yes.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] binary representation of datatypes

2008-10-22 Thread Merlin Moncure
On Wed, Oct 22, 2008 at 3:29 AM, Matthieu Imbert
[EMAIL PROTECTED] wrote:
 Jeroen Vermeulen wrote:
 Matthieu Imbert wrote:

 scenario 1 - parse the textual representation of all results of
 requests to the database and convert textual timestamps to a binary
 format that i choose among those ones (number of microseconds since
 2000-01-01, or a structure similar to pg_tm (but with
 microsecond precision), or a time-format similar to one defined in
 rfc1305, or something else)

 or

 scenario 2 - directly use pgsql binary timestamp format. I think the
 latter is far more efficient. I'm new to postgresql, but from
 what i understand, here are the conversions involved in both scenarios
 (hopping that my ascii art won't be garbled by your mail
 clients ;-) :


 scenario 1:
 .-.  .--.  .-.  .--.  .--.  
 .--.  .-.
 |timestamp|  |pgsql |  |timestamp|  |pgsql |  |timestamp |  |my 
|  |my   |
 |storage  |-|internal  |-|storage  |-|network   |-|as
 |-|timestamp |-|timestamp|
 |in   |  |to|  |in   |  |to|  |textual   |  
 |conversion|  |format   |
 |database |  |network   |  |network  |  |textual   |  |representation|  
 |routines  |  | |
 |backend  |  |conversion|  | |  |conversion|  |  |  |   
|  | |
 | |  |function  |  | |  |function  |  |  |  |   
|  | |
 '-'  '--'  '-'  '--'  '--'  
 '--'  '-'

 I think this scenario has two boxes too many.  Why would the backend
 convert to network representation before converting to text?


 Jeroen


 You mean that when results are asked in textual representation (the default), 
 data is sent on network directly as text?

yes.

You should know that text/binary conversions rarely play a significant
role in terms of performance.  There are exceptions...large bytea
columns, or enormous sets of integers.  This is coming from a guy that
co-wrote a library that allows you to pull data directly in binary.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] binary representation of datatypes

2008-10-22 Thread Andrew Chernow



You mean that when results are asked in textual representation (the default), 
data is sent on network directly as text?


yes.

You should know that text/binary conversions rarely play a significant
role in terms of performance.  There are exceptions...large bytea
columns, or enormous sets of integers.  This is coming from a guy that
co-wrote a library that allows you to pull data directly in binary.

merlin



If I remember correctly, composites and composite arrays also show worth while 
performance gains.  libpq array and composite handling is what initially spawned 
the libpqtypes project (which required providing type handling for every basic 
type like int and text).  So, different types were implemented for different 
reasons, it was not all performance.  The ultimate functionality we were looking 
for was multiple result sets, which composite arrays solve nicely.


--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] binary representation of datatypes

2008-10-22 Thread Merlin Moncure
On Wed, Oct 22, 2008 at 8:07 AM, Andrew Chernow [EMAIL PROTECTED] wrote:
 You mean that when results are asked in textual representation (the
 default), data is sent on network directly as text?

 You should know that text/binary conversions rarely play a significant
 role in terms of performance.  There are exceptions...large bytea
 columns, or enormous sets of integers.  This is coming from a guy that
 co-wrote a library that allows you to pull data directly in binary.

 merlin


 If I remember correctly, composites and composite arrays also show worth
 while performance gains.  libpq array and composite handling is what
 initially spawned the libpqtypes project (which required providing type
 handling for every basic type like int and text).  So, different types were
 implemented for different reasons, it was not all performance.  The ultimate
 functionality we were looking for was multiple result sets, which composite
 arrays solve nicely.

sure. That isn't, strictly speaking, a performance argument...it's
also a convenience thing.
You won't see a difference either way unless the arrays are large, or
a lot of them (big result sets).  For smaller result sets, the
overhead of executing the query is where all the time is spent.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] binary representation of datatypes

2008-10-21 Thread Michael Meskes
On Tue, Oct 21, 2008 at 10:21:38AM +0200, Matthieu Imbert wrote:
 I would like to be able to get results from SQL commands directly in a
 binary format, instead of a textual one. Actually, I want to be able to
 get timestamps with their full precision (microsecond).

Are you sure you cannot get those in textual mode? If so I wonder why I got
some numbers in a quick test:

...
[NO_PID]: ecpg_execute on line 37: query: select  *  from date_test where d =  
$1   ; with 1 parameter(s) on connection regress1
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ecpg_execute on line 37: using PQexecParams
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: free_params on line 37: parameter 1 = 1966-01-17
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ecpg_execute on line 37: correctly got 1 tuples with 2 fields
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ecpg_get_data on line 37: RESULT: 1966-01-17 offset: -1; array: yes
[NO_PID]: sqlca: code: 0, state: 0
[NO_PID]: ecpg_get_data on line 37: RESULT: 2000-07-12 17:34:29.140787 offset: 
-1; array: yes
[NO_PID]: sqlca: code: 0, state: 0
...

What do I miss here?

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] binary representation of datatypes

2008-10-21 Thread Matthieu Imbert
Michael Meskes wrote:
 On Tue, Oct 21, 2008 at 10:21:38AM +0200, Matthieu Imbert wrote:
 I would like to be able to get results from SQL commands directly in a
 binary format, instead of a textual one. Actually, I want to be able to
 get timestamps with their full precision (microsecond).
 
 Are you sure you cannot get those in textual mode? If so I wonder why I got
 some numbers in a quick test:
 
 ...
 [NO_PID]: ecpg_execute on line 37: query: select  *  from date_test where d = 
  $1   ; with 1 parameter(s) on connection regress1
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_execute on line 37: using PQexecParams
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: free_params on line 37: parameter 1 = 1966-01-17
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_execute on line 37: correctly got 1 tuples with 2 fields
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_get_data on line 37: RESULT: 1966-01-17 offset: -1; array: yes
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_get_data on line 37: RESULT: 2000-07-12 17:34:29.140787 
 offset: -1; array: yes
 [NO_PID]: sqlca: code: 0, state: 0
 ...
 
 What do I miss here?
 
 Michael


Yes microseconds are available in textual mode but i do want to use binary 
mode. Let me explain why:

- my data will be time series. So typical requests will return lots of 
timestamped data (mainly floats or int).

- after extraction i need to have all timestamps stored in format convenient 
for calculations. I can accommodate different formats
(for example: number of microseconds since 2000-01-01, or a structure similar 
to pg_tm (but with microsecond precision), or a
time-format similar to one defined in rfc1305), but definitely storing 
timestamps as text is a no go for me.

so i have two choices:

scenario 1 - parse the textual representation of all results of requests to the 
database and convert textual timestamps to a binary
format that i choose among those ones (number of microseconds since 2000-01-01, 
or a structure similar to pg_tm (but with
microsecond precision), or a time-format similar to one defined in rfc1305, or 
something else)

or

scenario 2 - directly use pgsql binary timestamp format. I think the latter is 
far more efficient. I'm new to postgresql, but from
what i understand, here are the conversions involved in both scenarios (hopping 
that my ascii art won't be garbled by your mail
clients ;-) :


scenario 1:
.-.  .--.  .-.  .--.  .--.  
.--.  .-.
|timestamp|  |pgsql |  |timestamp|  |pgsql |  |timestamp |  |my 
   |  |my   |
|storage  |-|internal  |-|storage  |-|network   |-|as
|-|timestamp |-|timestamp|
|in   |  |to|  |in   |  |to|  |textual   |  
|conversion|  |format   |
|database |  |network   |  |network  |  |textual   |  |representation|  
|routines  |  | |
|backend  |  |conversion|  | |  |conversion|  |  |  |   
   |  | |
| |  |function  |  | |  |function  |  |  |  |   
   |  | |
'-'  '--'  '-'  '--'  '--'  
'--'  '-'

scenario 2:
.-.  .--.  .-.  .--.  .-.
|timestamp|  |pgsql |  |timestamp|  |pgsql |  |timestamp|
|storage  |-|internal  |-|storage  |-|network   |-|official |
|in   |  |to|  |in   |  |to|  |format   |
|database |  |network   |  |network  |  |offical   |  | |
|backend  |  |conversion|  | |  |conversion|  | |
| |  |function  |  | |  |function  |  | |
'-'  '--'  '-'  '--'  '-'

if i'm correct, it seems obvious that the second scenario is more efficient 
(and less ugly).

In scenario 2, when talking about timestamp 'official' format, i mean timestamp 
expressed as number of microseconds since
2000-01-01. But of course, it only deserves this name 'official' if it is 
guaranteed to stay the same across postgresql versions and
platforms

-- 
Matthieu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] binary representation of datatypes

2008-10-21 Thread Merlin Moncure
On Tue, Oct 21, 2008 at 4:21 AM, Matthieu Imbert
[EMAIL PROTECTED] wrote:
 Dear postgresql hackers,

 I would like to be able to get results from SQL commands directly in a
 binary format, instead of a textual one. Actually, I want to be able to
 get timestamps with their full precision (microsecond).

 googling around i found some threads on this mailing list about this:
 http://archives.postgresql.org/pgsql-interfaces/2007-05/msg00047.php
 http://archives.postgresql.org/pgsql-interfaces/2007-06/msg0.php
 http://archives.postgresql.org/pgsql-interfaces/2007-03/msg7.php

 From these threads and from postgresql source code, i figured out how to
 get timestamp data the way i need it:

 - make a PQexecParams asking for results in binary format.
 - convert the returned 64 bits integer from network representation to
 host representation (reverse the order of the bytes or do nothing,
 depending on the endianness of the platform)
 - the resulting 64 bits integer is the number of microseconds since
 2000-01-01
 - convert this number of microseconds as needed.
 (my test code currently only handles the case where timestamps are int64)

 This works great but i have a few questions:
 - Is the binary representation of data (especially timestamps) subject
 to change in the future?
 - wouldn't it be a good think if functions for dealing with this binary
 representation are made available to client code (for example:
 pq_getmsgint64 in src/backend/libpq/pqformat.c or timestamp2tm in
 src/interfaces/ecpg/pgtypeslib/timestamp.c). Doing so would ensure that
 client code does not have to reimplement things already correctly done
 in postgres (with all special cases, and correct error handling), and
 would not be broken if the internals change. Moreover it would remove
 from client code the burden to handle both cases of timestamp as int64
 or timestamp as double.

 In short, what i would like (as a libpq client code writer), is a
 function which given an opaque binary representation of a timestamp
 returns me the timestamp as a number of microseconds since 2000-01-01,
 and a function which given a timestamp as a number of microseconds since
 2000-01-01 returns me a structure similar to pg_tm, but without loss of
 information (with microseconds). Of course, this would be needed not
 only for timestamps but also for other types.

 If this is not possible, at least what i would like is to be sure that
 the code i write for converting timestamp binary representation will not
 be broken by future postgresql release, and is portable.


you really want to look at libpqtypes.  It does exactly what you want,
as well as provides easy to follow binary handlers for every basic
type.

http://pgfoundry.org/projects/libpqtypes/
http://libpqtypes.esilo.com/

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] binary representation of datatypes

2008-10-21 Thread Michael Meskes
On Tue, Oct 21, 2008 at 01:37:44PM +0200, Matthieu Imbert wrote:
 Yes microseconds are available in textual mode but i do want to use binary 
 mode. Let me explain why:
 ...
 if i'm correct, it seems obvious that the second scenario is more efficient 
 (and less ugly).

I wouldn't bet on scenario 2 being more efficient. For this you not only need
less conversions but also cheaper conversion. Now I haven't looked at this in
detail, but you might spend a lot of time doing stuff that has only a marginal
effect.

 In scenario 2, when talking about timestamp 'official' format, i mean 
 timestamp expressed as number of microseconds since
 2000-01-01. But of course, it only deserves this name 'official' if it is 
 guaranteed to stay the same across postgresql versions and
 platforms

You shouldn't rely on this. Again I'd recommend using text. 

Michael

-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers