Re: RFI: Database URIs

2013-11-26 Thread Tim Bunce
On Fri, Nov 22, 2013 at 05:13:53PM -0800, David E. Wheeler wrote:
 DBI Folks  Gisle,
 
 I want to add support for specifying database connections as URIs to Sqitch, 
 my DB change management system. I started working on it today, following the 
 examples of JDBC and PostgreSQL. Before I release, though, I’d like a bit of 
 feedback on a couple of things.
 
 First, I'm using the database name as the scheme in the URL. Some examples:
 
 postgresql://user@localhost:5433/dbname
 sqlite:///path/to/foo.db
 
 This is to make it easy to tell one DB from another. But I'm wondering if I 
 should follow the example of JDBC more closely, and prepend db: or 
 something to them. More like DBI DSNs, too. However, it would require a bit 
 more work, as URI.pm does not currently recognize multiple colon-delimited 
 strings as scheme names AFAICT. :-(

Why not define a direct translation from a URL to a DBI DSN?
A translation that doesn't require knowledge of any driver-specifics.

Tim.


Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 25, 2013, at 11:02 PM, Jens Rehsack rehs...@gmail.com wrote:

   
 db:csv?f_dir=dataf_dir_search=foof_dir_search=barf_ext=.csv/rf_lock=2f_encoding=utf8csv_eol=%0D%0Acsv_sep_char=,csv_quote_char=%22csv_escape_char=%22csv+class=Text::CSV_XScsv_null=1RaiseError=1PrintError=1FetchHashKeyName=NAME_lccsv_tables=%7B%22info%22%3A%20%7B%22f_file%22%3A%22info_csv%22%7D%7D
 
 Happy hacking, when you want type that on command line. DBD::CSV (and other
 Pure-Perl drivers) is designed for flexibility and quick setup, not for
 expensive configuration and long-term running.

Actually, I take that back. I am only focused on having a URI format for the 
DSN, not the attribute hash --- at least for now. So I would revise that URI to 
just dbi:csv:. Unless SQLEngine was modified to support additional parameters 
in the DSN, all the other stuff would have to be passed via the attribute hash.

 You can’t - this is a clear point in DBI::DBD::SqlEngine at the moment.

Is that something that might change in the future? I tend to think of the DSN 
parts as being specific to a connection and specifiable by an end user --- such 
as the database name, server, username, password, etc. I tend to think of the 
attribute hash as for the developer, such as transaction and error handling. 
They are also more specific to the language -- I don't need a PrintError option 
in my Java app; it has some other API.

 All I say to you: if you want DBI supporting that - keep in mind the
 whole requirements. If you just want to do another DBI extension, with
 limited coverage and usage - feel free.

I don't know if I want the DBI to support it or not. I would like to get 
feedback on how best to represent database connection information in URIs, and 
what I might be overlooking. I am not sure what to do about attribute hashes, 
and so am ignoring them for now.

 But this discussion orthogonal to my original questions, which were:
 
 * Should I use a hierarchical scheme like JDBC? I’m leaning heavily toward 
 this now, just need a decent prefix, I'm thinking dbms, e.g., 
 dbms:csv:foo“.
 
 You can’t - because there is no „foo“ database for DBD::CSV at the moment.
 Conceptual issue. In current state of development, those pure-perl DBMS
 simulating drivers have no persistent metadata as mysql and postgresql have.

Well, that's just a matter of how you define that. Maybe foo is the name of 
the directory holding CSV files, and the file names are table names. I realize 
this is not how DBD::CSV currently works, just pointing out that these things 
are a matter of definition and implementation.

 I think you miss my point - I don’t say that you’re wrong, I say you
 lack a lot of requirements. When you don’t intend to support those
 pure-perl drivers, you’re fine and please go ahead. When you intend
 full DBI driver support, DBDI (https://github.com/timbunce/DBDI) could
 enlighten you. Following that concept would allow you a migration to p6
 one fine day ;)

I don't intend to support any drivers at all. I want to define a common URI 
standard for connecting to database engines without regard to any programming 
language or API at all.

Best,

David



Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 25, 2013, at 11:21 PM, H.Merijn Brand h.m.br...@xs4all.nl wrote:

 As I always use 2. when writing scripts, mostly because I use quite a
 few useful attributes in the 4th argument already, it is the most
 logical place: easy to maintain, easy to read and easy to extend.

Also very much specific to the DBI and DBDs, and a useful place for you as the 
developer to define how the DBI should interface with the database.

 One of the huge disadvantages of putting everything in the DSN is the
 you have to rethink if and when character like :, ;, , and ' should be
 escaped and how and that values like undef are pretty darn impossible
 (unless the driver has taken the trouble to support that).

And as far as I can tell, the DBI does not currently support any escaping at 
all in the DSN. I created a database with a comma in its name, and was unable 
to figure out how to get the DBI to connect to it.

 The first method however has a big advantage when the script (or a
 one-liner) is completely driven by environment settings.
 
   my $dbh = DBI-connect ($ENV{DBI_DSN});
 
 which happens to be the (underdocumented) default for connect and thus
 equivalent to
 
   my $dbh = DBI-connect;

I dislike this, personally. As a developer, I *always* exercise a certain 
amount of control over what gets passed in the attribute hash. It would be a 
rare condition where I would want to let an end-user do it. A multi-database 
command-line tool like dbish would be one of those rare exceptions.

Best,

David



Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 12:42 AM, Tim Bunce tim.bu...@pobox.com wrote:

 Why not define a direct translation from a URL to a DBI DSN?
 A translation that doesn't require knowledge of any driver-specifics.

Because I want to the onus of connecting to the database to be on the 
developer, not the end-user. I personally don't want to have to look up the 
format of the URI for every database I connect to, and why the DBI DSN and JDBC 
URL formats annoy me. I would rather have one solid, standard way to handle 
most of the stuff needed for a connection string (host, port, username, 
password, database name) and then pass additional stuff in a standard way (a 
GET query string).

So I have implemented a `dbi_dsn` method on the database URI classes I’ve 
written. The base class uses the most commonly-used format, which DBD::Pg, for 
one, relies on:

$ perl-MURI -E 'say 
URI-new(db:pg://me:sec...@example.com/foo.db)-dbi_dsn'
dbi:Pg:dbname=me:secret.com/foo.db

And then in subclasses I overrode appropriate methods to change the format. 
Here’s Ingres, for example:

$ perl -MURI -E 'say 
URI-new(db:ingres://me:sec...@example.com/foo.db)-dbi_dsn'
dbi:Ingres:foo.db

And here’s SQL Server:

$ perl -Ilib -MURI -E 'say 
URI-new(db:sqlserver://me:sec...@example.com/foo.db)-dbi_dsn' 
dbi:ODBC:Driver={SQL Server};Server=me:secret.com;Database=foo.db

Note that the URI is the same for all of these, except for the engine part of 
the scheme. Additional Engine-specific parameters can be passed in the GET 
string, and will simply be passed, through, e.g.,:

$ perl -Ilib -MURI -E 'say 
URI-new(db:pg://me:sec...@example.com/foo.db??connect_timeout=10application_name=myapp)-dbi_dsn'

dbi:Pg:host=me:secret.com;dbname=foo.db;?connect_timeout=10;application_name=myapp

This puts the most of the onus on translating from the URL to the DBI DSN on 
the developer of the URI class (me in this case) rather than on the user.

Best,

David

Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 10:02 AM, David E. Wheeler da...@justatheory.com wrote:

$ perl-MURI -E 'say 
 URI-new(db:pg://me:sec...@example.com/foo.db)-dbi_dsn'
dbi:Pg:dbname=me:secret.com/foo.db

Well, I can see I have a bug or two to work out. That should be:

$ perl -MURI -Ilib -E 'say 
URI-new(db:pg://me:sec...@example.com/foo.db)-dbi_dsn'
dbi:Pg:host=example.com;dbname=foo.db

Thanks,

David



Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 10:53 AM, David E. Wheeler da...@justatheory.com wrote:

 Well, I can see I have a bug or two to work out. That should be:
 
$ perl -MURI -Ilib -E 'say 
 URI-new(db:pg://me:sec...@example.com/foo.db)-dbi_dsn'
dbi:Pg:host=example.com;dbname=foo.db

Oh silly me not escaping the @. Let’s try that again:

$ perl -Ilib -MURI -E 'say URI-new(shift)-dbi_dsn' 
db:pg://me:sec...@example.com/foo_db   
dbi:Pg:host=example.com;dbname=foo_db

$ perl -Ilib -MURI -E 'say URI-new(shift)-dbi_dsn' 
db:ingres://me:sec...@example.com/foo_db
dbi:Ingres:foo_db

$ perl -Ilib -MURI -E 'say URI-new(shift)-dbi_dsn' 
db:sqlserver://me:sec...@example.com/foo_db 
dbi:ODBC:Driver={SQL Server};Server=example.com;Database=foo_db

Much saner. :-)

Best,

David

Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 11:26 AM, Gisle Aas gi...@activestate.com wrote:

 I do find the db: prefix ugly.  If you want users to see these strings I 
 would think they find this prefix to be clutter too.

Yeah. But I would thin, that if it *was* a standard, there would be just one 
scheme defined. That’s a guess, though. Maybe no one would care, and would not 
mind an unlimited number of new schemes (one for every db engine)? I don’t know.

 You seem to be alone in calling it pg:.  For the other examples out there I 
 see postgresql: or postgres:.  Should all different ways be allowed and 
 lead to the same thing?  I've seen both mysql: and mysql2:. What about 
 mariadb:?

Yeah, I was going to add postgres and postgresql subclasses. Maybe pgsql, 
too, I dunno. Never seen mysql2; what’s that? Maria is the same as MySQL, 
really; I should add it, though, and have it inherit from db:mysql, just as 
vertica currently inherits from db:pg.

 For sqlite:-URLs people seem to disagree on how to specify relative vs 
 absolute path names.

Really? To me, an absolute path starts with /. Anything else is relative.

 I wish there actually was _a_ standard for this stuff.  :-)

I’m planning to write it up as a blog post and talk to some other communities 
about it.

Thanks,

David



Re: RFI: Database URIs

2013-11-26 Thread Gisle Aas
To me the value of database urls would be compatibility with other
implementations of this obvious idea.  Some examples I found by quick
googling:

 http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls
 http://docs.stackato.com/3.0/user/services/data-services.html#database-url
 https://github.com/kennethreitz/dj-database-url
 https://github.com/glenngillen/rails-database-url
 http://www.jguru.com/faq/view.jsp?EID=690

http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-CONNSTRING

It would be useful for somebody to survey these and then provide a perl
module that's compatible with the consensus among these.  It would also be
a good place to document what the state of database urls currently is.

--Gisle



On Tue, Nov 26, 2013 at 9:42 AM, Tim Bunce tim.bu...@pobox.com wrote:

 On Fri, Nov 22, 2013 at 05:13:53PM -0800, David E. Wheeler wrote:
  DBI Folks  Gisle,
 
  I want to add support for specifying database connections as URIs to
 Sqitch, my DB change management system. I started working on it today,
 following the examples of JDBC and PostgreSQL. Before I release, though,
 I’d like a bit of feedback on a couple of things.
 
  First, I'm using the database name as the scheme in the URL. Some
 examples:
 
  postgresql://user@localhost:5433/dbname
  sqlite:///path/to/foo.db
 
  This is to make it easy to tell one DB from another. But I'm wondering
 if I should follow the example of JDBC more closely, and prepend db: or
 something to them. More like DBI DSNs, too. However, it would require a bit
 more work, as URI.pm does not currently recognize multiple colon-delimited
 strings as scheme names AFAICT. :-(

 Why not define a direct translation from a URL to a DBI DSN?
 A translation that doesn't require knowledge of any driver-specifics.

 Tim.



Re: RFI: Database URIs

2013-11-26 Thread Gisle Aas
I do find the db: prefix ugly.  If you want users to see these strings I
would think they find this prefix to be clutter too.

You seem to be alone in calling it pg:.  For the other examples out there
I see postgresql: or postgres:.  Should all different ways be allowed
and lead to the same thing?  I've seen both mysql: and mysql2:. What
about mariadb:?

For sqlite:-URLs people seem to disagree on how to specify relative vs
absolute path names.

I wish there actually was _a_ standard for this stuff.  :-)

--Gisle



On Tue, Nov 26, 2013 at 7:14 PM, David E. Wheeler da...@justatheory.comwrote:

 On Nov 26, 2013, at 9:32 AM, Gisle Aas gi...@activestate.com wrote:

  To me the value of database urls would be compatibility with other
 implementations of this obvious idea.  Some examples I found by quick
 googling:
 
   http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls
 
 http://docs.stackato.com/3.0/user/services/data-services.html#database-url
   https://github.com/kennethreitz/dj-database-url
   https://github.com/glenngillen/rails-database-url
   http://www.jguru.com/faq/view.jsp?EID=690
 
 http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-CONNSTRING
 
  It would be useful for somebody to survey these and then provide a perl
 module that's compatible with the consensus among these.  It would also be
 a good place to document what the state of database urls currently is.

 Thanks for those links They are all very similar (except for the rails
 one, which lacks docs). And they are similar to what I've come up with,
 detailed here:

   https://github.com/theory/uri-db/blob/master/lib/URI/db.pm

 Basically, a database format is actually two URIs. The first is an opaque
 URI, db:(.+). The second URL is inside the (.+), and can take two forms:

 $engine://$user:$pass@$host:$port$dbname?$params
 $engine:$dbname?$params

 Some examples from the test suite:

 db:
 db:pg:
 db:pg://localhost
 db:pg://localhost:33
 db:pg://foo:123/try?foo=1foo=2lol=yes
 db:sqlite:
 db:sqlite:foo.db
 db:sqlite:/path/foo.db
 db:sqlite:///path/foo.db
 db:cubrid://localhost:33/foo
 db:db2://localhost:33/foo
 db:firebird://localhost:33/foo
 db:informix:foo.db
 db:informix:foo.db?foo=1
 db:ingres:foo.db
 db:ingres:foo.db?foo=1
 db:interbase://localhost:33/foo
 db:maxdb://localhost:33/foo
 db:maxdb://localhost/foo
 db:monetdb://localhost:1222?foo=1
 db:monetdb://localhost/lolz
 db:mysql://localhost:33/foo
 db:oracle://localhost:33/foo
 db:sqlserver://localhost:33/foo
 db:sybase://localhost:33/foo
 db:teradata://localhost
 db:teradata://localhost:33/foo?hi=1
 db:unify:foo.db
 db:unify:
 db:unify:?foo=1bar=2
 db:vertica:
 db:vertica://localhost
 db:vertica://localhost:33
 db:vertica://foo:123/try?foo=1foo=2lol=yes

 I think it’s pretty simple and familiar, and plan to put it (or something
 similar, based on this and other discussions) in Sqitch.

 Thoughts?

 David




Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 11:49 AM, Gisle Aas gi...@activestate.com wrote:

 There is also precedence for using + in scheme names.  Something like 
 db+postgresql: then.  It's still cluttered, and not really compatible with 
 what other have used.  Or x-postgres: while it's still experimental.  
 Naming stuff is hard.

Oh, I thought URI didn’t like +, but it turns out it is okay with it.

$ perl -MURI -E 'say URI-new(x-ssh+foo://example.com)-scheme'
x-ssh+foo

Cool. Downside: Each DB engine requires its own scheme, which would make formal 
registration rather a PITA. I do see some registered “hierarchical” schemes 
using a dot, though, e.g., iris.beep, iris.xpc, etc.:

  http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

Surely someone has done this before, though. :-(

Yeah, naming is hard. Probably why it’s one of my favorite things to do. :-)

 I don't know.  Version 2 of the mysql protocol perhaps.  I saw it at 
 https://github.com/kennethreitz/dj-database-url/blob/master/dj_database_url.py#L17

Well, if it’s a real thing, I’m happy to add it.

 The scheme really should just be named after the protocol, not the kind of 
 product you happen to find at the other end.

But scheme != protocol. [Wikipedia](http://en.wikipedia.org/wiki/URI_scheme) 
says:

 URI schemes are frequently and incorrectly referred to as protocols, or 
 specifically as URI protocols or URL protocols, since most were originally 
 designed to be used with a particular protocol, and often have the same name. 
 The http scheme, for instance, is generally used for interacting with web 
 resources using HyperText Transfer Protocol. Today, URIs with that scheme are 
 also used for other purposes, such as RDF resource identifiers and XML 
 namespaces, that are not related to the protocol. Furthermore, some URI 
 schemes are not associated with any specific protocol (e.g. file) and many 
 others do not use the name of a protocol as their prefix (e.g. news).

To me it makes sense not to tie it to a particular protocol. I want to connect 
to a database, and don’t much care about the protocol. The PostgreSQL libpq URL 
starts with postgresql://, not libpq://.

 Is sqlite:///path relative or absolute then?  What about sqlite:/path?

Both are absolute. sqlite://path and sqlite:path are both relative. This is how 
my implementation handles them.

 dj-database-url claims 4 slashes is the way to go; sqlite:path

Wow, is that ugly.

I was relying on the standard for file: URLs as the precedent.

 The ftp: standard had the same problem.  It was once specified that 
 ftp://server/%2Fpath was to be required to make the path absolute, while 
 ftp://server/path was supposed to be relative to the home directory you ended 
 up in after logging in to the ftp account.  This was very confusing to users 
 so most browers just implemented the path to always be absolute, with no way 
 to access relative paths. That's at least how I remembered it.

%2f is ugly. Of course, there is no host name in file: URIs. If we add it, as 
would be needed for Firebird, for example, I can see why dj-database-url ended 
up with four slashes: This is a full path:

db:firebird://example.com//path/to/db

So I guess these should be equivalent:

db:firebird:path/to/db
db:firebird:/path/to/db

The first has an empty string for the network location, the second has no 
network location part at all. I’ve just committed a fix to support this.

  
https://github.com/theory/uri-db/commit/cd60c7ac7e02572e5db3b39d2acb08b4a7fcfefe

Best,

David



Re: RFI: Database URIs

2013-11-26 Thread Tim Bunce
On Tue, Nov 26, 2013 at 08:49:43PM +0100, Gisle Aas wrote:
 
The scheme really should just be named after the protocol, not the kind of 
 product you happen to find at
the other end.
It would certainly be preferable if there was a single name for each 
 protocol.

ODBC complicates that further.

Tim.


Re: RFI: Database URIs

2013-11-26 Thread Gisle Aas
On Tue, Nov 26, 2013 at 8:32 PM, David E. Wheeler da...@justatheory.comwrote:

 On Nov 26, 2013, at 11:26 AM, Gisle Aas gi...@activestate.com wrote:

  I do find the db: prefix ugly.  If you want users to see these strings
 I would think they find this prefix to be clutter too.

 Yeah. But I would thin, that if it *was* a standard, there would be just
 one scheme defined. That’s a guess, though. Maybe no one would care, and
 would not mind an unlimited number of new schemes (one for every db
 engine)? I don’t know.


There is also precedence for using + in scheme names.  Something like
db+postgresql: then.  It's still cluttered, and not really compatible
with what other have used.  Or x-postgres: while it's still experimental.
 Naming stuff is hard.


  You seem to be alone in calling it pg:.  For the other examples out
 there I see postgresql: or postgres:.  Should all different ways be
 allowed and lead to the same thing?  I've seen both mysql: and mysql2:.
 What about mariadb:?

 Yeah, I was going to add postgres and postgresql subclasses. Maybe
 pgsql, too, I dunno. Never seen mysql2; what’s that?


I don't know.  Version 2 of the mysql protocol perhaps.  I saw it at
https://github.com/kennethreitz/dj-database-url/blob/master/dj_database_url.py#L17


 Maria is the same as MySQL, really; I should add it, though, and have it
 inherit from db:mysql, just as vertica currently inherits from db:pg.


The scheme really should just be named after the protocol, not the kind of
product you happen to find at the other end.

It would certainly be preferable if there was a single name for each
protocol.


  For sqlite:-URLs people seem to disagree on how to specify relative vs
 absolute path names.

 Really? To me, an absolute path starts with /. Anything else is relative.


Is sqlite:///path relative or absolute then?  What about sqlite:/path?

dj-database-url claims 4 slashes is the way to go; sqlite:path

The ftp: standard had the same problem.  It was once specified that
ftp://server/%2Fpath was to be required to make the path absolute, while
ftp://server/path was supposed to be relative to the home directory you
ended up in after logging in to the ftp account.  This was very confusing
to users so most browers just implemented the path to always be absolute,
with no way to access relative paths. That's at least how I remembered it.

--Gisle


Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 12:43 PM, Tim Bunce tim.bu...@pobox.com wrote:

 ODBC complicates that further.

Indeed. I want to avoid the protocol.

I've now written up my proposal as a blog post:

  http://theory.so/rfc/2013/11/26/toward-a-database-uri-standard/

Thanks,

David