Re: RFI: Database URIs

2013-11-26 Thread Tim Bunce
On Fri, Nov 22, 2013 at 05:13:53PM -0800, David E. Wheeler wrote:
 DBI Folks  Gisle,
 
 I want to add support for specifying database connections as URIs to Sqitch, 
 my DB change management system. I started working on it today, following the 
 examples of JDBC and PostgreSQL. Before I release, though, I’d like a bit of 
 feedback on a couple of things.
 
 First, I'm using the database name as the scheme in the URL. Some examples:
 
 postgresql://user@localhost:5433/dbname
 sqlite:///path/to/foo.db
 
 This is to make it easy to tell one DB from another. But I'm wondering if I 
 should follow the example of JDBC more closely, and prepend db: or 
 something to them. More like DBI DSNs, too. However, it would require a bit 
 more work, as URI.pm does not currently recognize multiple colon-delimited 
 strings as scheme names AFAICT. :-(

Why not define a direct translation from a URL to a DBI DSN?
A translation that doesn't require knowledge of any driver-specifics.

Tim.


Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 25, 2013, at 11:02 PM, Jens Rehsack rehs...@gmail.com wrote:

   
 db:csv?f_dir=dataf_dir_search=foof_dir_search=barf_ext=.csv/rf_lock=2f_encoding=utf8csv_eol=%0D%0Acsv_sep_char=,csv_quote_char=%22csv_escape_char=%22csv+class=Text::CSV_XScsv_null=1RaiseError=1PrintError=1FetchHashKeyName=NAME_lccsv_tables=%7B%22info%22%3A%20%7B%22f_file%22%3A%22info_csv%22%7D%7D
 
 Happy hacking, when you want type that on command line. DBD::CSV (and other
 Pure-Perl drivers) is designed for flexibility and quick setup, not for
 expensive configuration and long-term running.

Actually, I take that back. I am only focused on having a URI format for the 
DSN, not the attribute hash --- at least for now. So I would revise that URI to 
just dbi:csv:. Unless SQLEngine was modified to support additional parameters 
in the DSN, all the other stuff would have to be passed via the attribute hash.

 You can’t - this is a clear point in DBI::DBD::SqlEngine at the moment.

Is that something that might change in the future? I tend to think of the DSN 
parts as being specific to a connection and specifiable by an end user --- such 
as the database name, server, username, password, etc. I tend to think of the 
attribute hash as for the developer, such as transaction and error handling. 
They are also more specific to the language -- I don't need a PrintError option 
in my Java app; it has some other API.

 All I say to you: if you want DBI supporting that - keep in mind the
 whole requirements. If you just want to do another DBI extension, with
 limited coverage and usage - feel free.

I don't know if I want the DBI to support it or not. I would like to get 
feedback on how best to represent database connection information in URIs, and 
what I might be overlooking. I am not sure what to do about attribute hashes, 
and so am ignoring them for now.

 But this discussion orthogonal to my original questions, which were:
 
 * Should I use a hierarchical scheme like JDBC? I’m leaning heavily toward 
 this now, just need a decent prefix, I'm thinking dbms, e.g., 
 dbms:csv:foo“.
 
 You can’t - because there is no „foo“ database for DBD::CSV at the moment.
 Conceptual issue. In current state of development, those pure-perl DBMS
 simulating drivers have no persistent metadata as mysql and postgresql have.

Well, that's just a matter of how you define that. Maybe foo is the name of 
the directory holding CSV files, and the file names are table names. I realize 
this is not how DBD::CSV currently works, just pointing out that these things 
are a matter of definition and implementation.

 I think you miss my point - I don’t say that you’re wrong, I say you
 lack a lot of requirements. When you don’t intend to support those
 pure-perl drivers, you’re fine and please go ahead. When you intend
 full DBI driver support, DBDI (https://github.com/timbunce/DBDI) could
 enlighten you. Following that concept would allow you a migration to p6
 one fine day ;)

I don't intend to support any drivers at all. I want to define a common URI 
standard for connecting to database engines without regard to any programming 
language or API at all.

Best,

David



Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 25, 2013, at 11:21 PM, H.Merijn Brand h.m.br...@xs4all.nl wrote:

 As I always use 2. when writing scripts, mostly because I use quite a
 few useful attributes in the 4th argument already, it is the most
 logical place: easy to maintain, easy to read and easy to extend.

Also very much specific to the DBI and DBDs, and a useful place for you as the 
developer to define how the DBI should interface with the database.

 One of the huge disadvantages of putting everything in the DSN is the
 you have to rethink if and when character like :, ;, , and ' should be
 escaped and how and that values like undef are pretty darn impossible
 (unless the driver has taken the trouble to support that).

And as far as I can tell, the DBI does not currently support any escaping at 
all in the DSN. I created a database with a comma in its name, and was unable 
to figure out how to get the DBI to connect to it.

 The first method however has a big advantage when the script (or a
 one-liner) is completely driven by environment settings.
 
   my $dbh = DBI-connect ($ENV{DBI_DSN});
 
 which happens to be the (underdocumented) default for connect and thus
 equivalent to
 
   my $dbh = DBI-connect;

I dislike this, personally. As a developer, I *always* exercise a certain 
amount of control over what gets passed in the attribute hash. It would be a 
rare condition where I would want to let an end-user do it. A multi-database 
command-line tool like dbish would be one of those rare exceptions.

Best,

David



Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 12:42 AM, Tim Bunce tim.bu...@pobox.com wrote:

 Why not define a direct translation from a URL to a DBI DSN?
 A translation that doesn't require knowledge of any driver-specifics.

Because I want to the onus of connecting to the database to be on the 
developer, not the end-user. I personally don't want to have to look up the 
format of the URI for every database I connect to, and why the DBI DSN and JDBC 
URL formats annoy me. I would rather have one solid, standard way to handle 
most of the stuff needed for a connection string (host, port, username, 
password, database name) and then pass additional stuff in a standard way (a 
GET query string).

So I have implemented a `dbi_dsn` method on the database URI classes I’ve 
written. The base class uses the most commonly-used format, which DBD::Pg, for 
one, relies on:

$ perl-MURI -E 'say 
URI-new(db:pg://me:sec...@example.com/foo.db)-dbi_dsn'
dbi:Pg:dbname=me:secret.com/foo.db

And then in subclasses I overrode appropriate methods to change the format. 
Here’s Ingres, for example:

$ perl -MURI -E 'say 
URI-new(db:ingres://me:sec...@example.com/foo.db)-dbi_dsn'
dbi:Ingres:foo.db

And here’s SQL Server:

$ perl -Ilib -MURI -E 'say 
URI-new(db:sqlserver://me:sec...@example.com/foo.db)-dbi_dsn' 
dbi:ODBC:Driver={SQL Server};Server=me:secret.com;Database=foo.db

Note that the URI is the same for all of these, except for the engine part of 
the scheme. Additional Engine-specific parameters can be passed in the GET 
string, and will simply be passed, through, e.g.,:

$ perl -Ilib -MURI -E 'say 
URI-new(db:pg://me:sec...@example.com/foo.db??connect_timeout=10application_name=myapp)-dbi_dsn'

dbi:Pg:host=me:secret.com;dbname=foo.db;?connect_timeout=10;application_name=myapp

This puts the most of the onus on translating from the URL to the DBI DSN on 
the developer of the URI class (me in this case) rather than on the user.

Best,

David

Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 10:02 AM, David E. Wheeler da...@justatheory.com wrote:

$ perl-MURI -E 'say 
 URI-new(db:pg://me:sec...@example.com/foo.db)-dbi_dsn'
dbi:Pg:dbname=me:secret.com/foo.db

Well, I can see I have a bug or two to work out. That should be:

$ perl -MURI -Ilib -E 'say 
URI-new(db:pg://me:sec...@example.com/foo.db)-dbi_dsn'
dbi:Pg:host=example.com;dbname=foo.db

Thanks,

David



Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 10:53 AM, David E. Wheeler da...@justatheory.com wrote:

 Well, I can see I have a bug or two to work out. That should be:
 
$ perl -MURI -Ilib -E 'say 
 URI-new(db:pg://me:sec...@example.com/foo.db)-dbi_dsn'
dbi:Pg:host=example.com;dbname=foo.db

Oh silly me not escaping the @. Let’s try that again:

$ perl -Ilib -MURI -E 'say URI-new(shift)-dbi_dsn' 
db:pg://me:sec...@example.com/foo_db   
dbi:Pg:host=example.com;dbname=foo_db

$ perl -Ilib -MURI -E 'say URI-new(shift)-dbi_dsn' 
db:ingres://me:sec...@example.com/foo_db
dbi:Ingres:foo_db

$ perl -Ilib -MURI -E 'say URI-new(shift)-dbi_dsn' 
db:sqlserver://me:sec...@example.com/foo_db 
dbi:ODBC:Driver={SQL Server};Server=example.com;Database=foo_db

Much saner. :-)

Best,

David

Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 11:26 AM, Gisle Aas gi...@activestate.com wrote:

 I do find the db: prefix ugly.  If you want users to see these strings I 
 would think they find this prefix to be clutter too.

Yeah. But I would thin, that if it *was* a standard, there would be just one 
scheme defined. That’s a guess, though. Maybe no one would care, and would not 
mind an unlimited number of new schemes (one for every db engine)? I don’t know.

 You seem to be alone in calling it pg:.  For the other examples out there I 
 see postgresql: or postgres:.  Should all different ways be allowed and 
 lead to the same thing?  I've seen both mysql: and mysql2:. What about 
 mariadb:?

Yeah, I was going to add postgres and postgresql subclasses. Maybe pgsql, 
too, I dunno. Never seen mysql2; what’s that? Maria is the same as MySQL, 
really; I should add it, though, and have it inherit from db:mysql, just as 
vertica currently inherits from db:pg.

 For sqlite:-URLs people seem to disagree on how to specify relative vs 
 absolute path names.

Really? To me, an absolute path starts with /. Anything else is relative.

 I wish there actually was _a_ standard for this stuff.  :-)

I’m planning to write it up as a blog post and talk to some other communities 
about it.

Thanks,

David



Re: RFI: Database URIs

2013-11-26 Thread Gisle Aas
To me the value of database urls would be compatibility with other
implementations of this obvious idea.  Some examples I found by quick
googling:

 http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls
 http://docs.stackato.com/3.0/user/services/data-services.html#database-url
 https://github.com/kennethreitz/dj-database-url
 https://github.com/glenngillen/rails-database-url
 http://www.jguru.com/faq/view.jsp?EID=690

http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-CONNSTRING

It would be useful for somebody to survey these and then provide a perl
module that's compatible with the consensus among these.  It would also be
a good place to document what the state of database urls currently is.

--Gisle



On Tue, Nov 26, 2013 at 9:42 AM, Tim Bunce tim.bu...@pobox.com wrote:

 On Fri, Nov 22, 2013 at 05:13:53PM -0800, David E. Wheeler wrote:
  DBI Folks  Gisle,
 
  I want to add support for specifying database connections as URIs to
 Sqitch, my DB change management system. I started working on it today,
 following the examples of JDBC and PostgreSQL. Before I release, though,
 I’d like a bit of feedback on a couple of things.
 
  First, I'm using the database name as the scheme in the URL. Some
 examples:
 
  postgresql://user@localhost:5433/dbname
  sqlite:///path/to/foo.db
 
  This is to make it easy to tell one DB from another. But I'm wondering
 if I should follow the example of JDBC more closely, and prepend db: or
 something to them. More like DBI DSNs, too. However, it would require a bit
 more work, as URI.pm does not currently recognize multiple colon-delimited
 strings as scheme names AFAICT. :-(

 Why not define a direct translation from a URL to a DBI DSN?
 A translation that doesn't require knowledge of any driver-specifics.

 Tim.



Re: RFI: Database URIs

2013-11-26 Thread Gisle Aas
I do find the db: prefix ugly.  If you want users to see these strings I
would think they find this prefix to be clutter too.

You seem to be alone in calling it pg:.  For the other examples out there
I see postgresql: or postgres:.  Should all different ways be allowed
and lead to the same thing?  I've seen both mysql: and mysql2:. What
about mariadb:?

For sqlite:-URLs people seem to disagree on how to specify relative vs
absolute path names.

I wish there actually was _a_ standard for this stuff.  :-)

--Gisle



On Tue, Nov 26, 2013 at 7:14 PM, David E. Wheeler da...@justatheory.comwrote:

 On Nov 26, 2013, at 9:32 AM, Gisle Aas gi...@activestate.com wrote:

  To me the value of database urls would be compatibility with other
 implementations of this obvious idea.  Some examples I found by quick
 googling:
 
   http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#database-urls
 
 http://docs.stackato.com/3.0/user/services/data-services.html#database-url
   https://github.com/kennethreitz/dj-database-url
   https://github.com/glenngillen/rails-database-url
   http://www.jguru.com/faq/view.jsp?EID=690
 
 http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-CONNSTRING
 
  It would be useful for somebody to survey these and then provide a perl
 module that's compatible with the consensus among these.  It would also be
 a good place to document what the state of database urls currently is.

 Thanks for those links They are all very similar (except for the rails
 one, which lacks docs). And they are similar to what I've come up with,
 detailed here:

   https://github.com/theory/uri-db/blob/master/lib/URI/db.pm

 Basically, a database format is actually two URIs. The first is an opaque
 URI, db:(.+). The second URL is inside the (.+), and can take two forms:

 $engine://$user:$pass@$host:$port$dbname?$params
 $engine:$dbname?$params

 Some examples from the test suite:

 db:
 db:pg:
 db:pg://localhost
 db:pg://localhost:33
 db:pg://foo:123/try?foo=1foo=2lol=yes
 db:sqlite:
 db:sqlite:foo.db
 db:sqlite:/path/foo.db
 db:sqlite:///path/foo.db
 db:cubrid://localhost:33/foo
 db:db2://localhost:33/foo
 db:firebird://localhost:33/foo
 db:informix:foo.db
 db:informix:foo.db?foo=1
 db:ingres:foo.db
 db:ingres:foo.db?foo=1
 db:interbase://localhost:33/foo
 db:maxdb://localhost:33/foo
 db:maxdb://localhost/foo
 db:monetdb://localhost:1222?foo=1
 db:monetdb://localhost/lolz
 db:mysql://localhost:33/foo
 db:oracle://localhost:33/foo
 db:sqlserver://localhost:33/foo
 db:sybase://localhost:33/foo
 db:teradata://localhost
 db:teradata://localhost:33/foo?hi=1
 db:unify:foo.db
 db:unify:
 db:unify:?foo=1bar=2
 db:vertica:
 db:vertica://localhost
 db:vertica://localhost:33
 db:vertica://foo:123/try?foo=1foo=2lol=yes

 I think it’s pretty simple and familiar, and plan to put it (or something
 similar, based on this and other discussions) in Sqitch.

 Thoughts?

 David




Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 11:49 AM, Gisle Aas gi...@activestate.com wrote:

 There is also precedence for using + in scheme names.  Something like 
 db+postgresql: then.  It's still cluttered, and not really compatible with 
 what other have used.  Or x-postgres: while it's still experimental.  
 Naming stuff is hard.

Oh, I thought URI didn’t like +, but it turns out it is okay with it.

$ perl -MURI -E 'say URI-new(x-ssh+foo://example.com)-scheme'
x-ssh+foo

Cool. Downside: Each DB engine requires its own scheme, which would make formal 
registration rather a PITA. I do see some registered “hierarchical” schemes 
using a dot, though, e.g., iris.beep, iris.xpc, etc.:

  http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml

Surely someone has done this before, though. :-(

Yeah, naming is hard. Probably why it’s one of my favorite things to do. :-)

 I don't know.  Version 2 of the mysql protocol perhaps.  I saw it at 
 https://github.com/kennethreitz/dj-database-url/blob/master/dj_database_url.py#L17

Well, if it’s a real thing, I’m happy to add it.

 The scheme really should just be named after the protocol, not the kind of 
 product you happen to find at the other end.

But scheme != protocol. [Wikipedia](http://en.wikipedia.org/wiki/URI_scheme) 
says:

 URI schemes are frequently and incorrectly referred to as protocols, or 
 specifically as URI protocols or URL protocols, since most were originally 
 designed to be used with a particular protocol, and often have the same name. 
 The http scheme, for instance, is generally used for interacting with web 
 resources using HyperText Transfer Protocol. Today, URIs with that scheme are 
 also used for other purposes, such as RDF resource identifiers and XML 
 namespaces, that are not related to the protocol. Furthermore, some URI 
 schemes are not associated with any specific protocol (e.g. file) and many 
 others do not use the name of a protocol as their prefix (e.g. news).

To me it makes sense not to tie it to a particular protocol. I want to connect 
to a database, and don’t much care about the protocol. The PostgreSQL libpq URL 
starts with postgresql://, not libpq://.

 Is sqlite:///path relative or absolute then?  What about sqlite:/path?

Both are absolute. sqlite://path and sqlite:path are both relative. This is how 
my implementation handles them.

 dj-database-url claims 4 slashes is the way to go; sqlite:path

Wow, is that ugly.

I was relying on the standard for file: URLs as the precedent.

 The ftp: standard had the same problem.  It was once specified that 
 ftp://server/%2Fpath was to be required to make the path absolute, while 
 ftp://server/path was supposed to be relative to the home directory you ended 
 up in after logging in to the ftp account.  This was very confusing to users 
 so most browers just implemented the path to always be absolute, with no way 
 to access relative paths. That's at least how I remembered it.

%2f is ugly. Of course, there is no host name in file: URIs. If we add it, as 
would be needed for Firebird, for example, I can see why dj-database-url ended 
up with four slashes: This is a full path:

db:firebird://example.com//path/to/db

So I guess these should be equivalent:

db:firebird:path/to/db
db:firebird:/path/to/db

The first has an empty string for the network location, the second has no 
network location part at all. I’ve just committed a fix to support this.

  
https://github.com/theory/uri-db/commit/cd60c7ac7e02572e5db3b39d2acb08b4a7fcfefe

Best,

David



Re: RFI: Database URIs

2013-11-26 Thread Tim Bunce
On Tue, Nov 26, 2013 at 08:49:43PM +0100, Gisle Aas wrote:
 
The scheme really should just be named after the protocol, not the kind of 
 product you happen to find at
the other end.
It would certainly be preferable if there was a single name for each 
 protocol.

ODBC complicates that further.

Tim.


Re: RFI: Database URIs

2013-11-26 Thread Gisle Aas
On Tue, Nov 26, 2013 at 8:32 PM, David E. Wheeler da...@justatheory.comwrote:

 On Nov 26, 2013, at 11:26 AM, Gisle Aas gi...@activestate.com wrote:

  I do find the db: prefix ugly.  If you want users to see these strings
 I would think they find this prefix to be clutter too.

 Yeah. But I would thin, that if it *was* a standard, there would be just
 one scheme defined. That’s a guess, though. Maybe no one would care, and
 would not mind an unlimited number of new schemes (one for every db
 engine)? I don’t know.


There is also precedence for using + in scheme names.  Something like
db+postgresql: then.  It's still cluttered, and not really compatible
with what other have used.  Or x-postgres: while it's still experimental.
 Naming stuff is hard.


  You seem to be alone in calling it pg:.  For the other examples out
 there I see postgresql: or postgres:.  Should all different ways be
 allowed and lead to the same thing?  I've seen both mysql: and mysql2:.
 What about mariadb:?

 Yeah, I was going to add postgres and postgresql subclasses. Maybe
 pgsql, too, I dunno. Never seen mysql2; what’s that?


I don't know.  Version 2 of the mysql protocol perhaps.  I saw it at
https://github.com/kennethreitz/dj-database-url/blob/master/dj_database_url.py#L17


 Maria is the same as MySQL, really; I should add it, though, and have it
 inherit from db:mysql, just as vertica currently inherits from db:pg.


The scheme really should just be named after the protocol, not the kind of
product you happen to find at the other end.

It would certainly be preferable if there was a single name for each
protocol.


  For sqlite:-URLs people seem to disagree on how to specify relative vs
 absolute path names.

 Really? To me, an absolute path starts with /. Anything else is relative.


Is sqlite:///path relative or absolute then?  What about sqlite:/path?

dj-database-url claims 4 slashes is the way to go; sqlite:path

The ftp: standard had the same problem.  It was once specified that
ftp://server/%2Fpath was to be required to make the path absolute, while
ftp://server/path was supposed to be relative to the home directory you
ended up in after logging in to the ftp account.  This was very confusing
to users so most browers just implemented the path to always be absolute,
with no way to access relative paths. That's at least how I remembered it.

--Gisle


Re: RFI: Database URIs

2013-11-26 Thread David E. Wheeler
On Nov 26, 2013, at 12:43 PM, Tim Bunce tim.bu...@pobox.com wrote:

 ODBC complicates that further.

Indeed. I want to avoid the protocol.

I've now written up my proposal as a blog post:

  http://theory.so/rfc/2013/11/26/toward-a-database-uri-standard/

Thanks,

David



Re: RFI: Database URIs

2013-11-25 Thread David E. Wheeler
On Nov 25, 2013, at 3:50 AM, Jens Rehsack rehs...@gmail.com wrote:

DBI-connect($dsn, $user, $passwd, \%attr)
 
 4th argument is wasted in your current proposal.

Er, well, I failed to provide a complete set of examples. Here’s one from the 
PostgreSQL docs:

  postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp

All the attributes are provided by the GET query string.

Best,

David



Re: RFI: Database URIs

2013-11-25 Thread Jens Rehsack

Am 25.11.2013 um 18:00 schrieb David E. Wheeler da...@justatheory.com:

 On Nov 25, 2013, at 3:50 AM, Jens Rehsack rehs...@gmail.com wrote:
 
   DBI-connect($dsn, $user, $passwd, \%attr)
 
 4th argument is wasted in your current proposal.
 
 Er, well, I failed to provide a complete set of examples. Here’s one from the 
 PostgreSQL docs:
 
  
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
 
 All the attributes are provided by the GET query string.

Let’s go - shoot:

# specify most possible flags via driver flags
$dbh = DBI-connect (dbi:CSV:, undef, undef, {
f_schema = undef,
f_dir= data,
f_dir_search = [],
f_ext= .csv/r,
f_lock   = 2,
f_encoding   = utf8,

csv_eol  = \r\n,
csv_sep_char = ,,
csv_quote_char   = '',
csv_escape_char  = '',
csv_class= Text::CSV_XS,
csv_null = 1,
csv_tables   = {
info = { f_file = info.csv }
},

RaiseError   = 1,
PrintError   = 1,
FetchHashKeyName = NAME_lc,
}) or die $DBI::errstr;

And keep in mind, csv_tables can be more complex and there’re other
attributes like it eg. in DBD::AnyData.

Cheers
-- 
Jens Rehsack
rehs...@gmail.com







Re: RFI: Database URIs

2013-11-25 Thread David E. Wheeler
On Nov 25, 2013, at 11:08 AM, Jens Rehsack rehs...@gmail.com wrote:

 Let’s go - shoot:
 
# specify most possible flags via driver flags
$dbh = DBI-connect (dbi:CSV:, undef, undef, {
f_schema = undef,
f_dir= data,
f_dir_search = [],
f_ext= .csv/r,
f_lock   = 2,
f_encoding   = utf8,
 
csv_eol  = \r\n,
csv_sep_char = ,,
csv_quote_char   = '',
csv_escape_char  = '',
csv_class= Text::CSV_XS,
csv_null = 1,
csv_tables   = {
info = { f_file = info.csv }
},
 
RaiseError   = 1,
PrintError   = 1,
FetchHashKeyName = NAME_lc,
}) or die $DBI::errstr;
 
 And keep in mind, csv_tables can be more complex and there’re other
 attributes like it eg. in DBD::AnyData.

Well, how you would want to handle params would be up to you. No, they are not 
great for hashes, but do-able.


db:csv?f_dir=dataf_dir_search=foof_dir_search=barf_ext=.csv/rf_lock=2f_encoding=utf8csv_eol=%0D%0Acsv_sep_char=,csv_quote_char=%22csv_escape_char=%22csv+class=Text::CSV_XScsv_null=1RaiseError=1PrintError=1FetchHashKeyName=NAME_lccsv_tables=%7B%22info%22%3A%20%7B%22f_file%22%3A%22info_csv%22%7D%7D

So yeah, one would need to do some sort of parsting of nested data (JSON in the 
csv_tables example here), though arrays work okay (e.g., f_dir_search).

OTOH, can you specify all this stuff in a DSN parseable by parse_dsn(), either?

But my point is not to replace the connect() API, but to create a standard of 
sorts for representing connection info in a URL, to make it easier to specify 
how to connect to things on the command-line. Yeah, if you want to do 
everything, it will require more work, but that would be up to the code that 
handles each driver, which is to say the URI subclass for a particular DBMS.

But this discussion orthogonal to my original questions, which were:

* Should I use a hierarchical scheme like JDBC? I’m leaning heavily toward this 
now, just need a decent prefix, I'm thinking dbms, e.g., dbms:csv:foo.

* Do I have the metadata wrong for any of the DBMSes I have so far added 
support for? Right now that’s just the ports they listen on by default:

 https://github.com/theory/uri-db/blob/master/t/db.t#L9

Best,

David



Re: RFI: Database URIs

2013-11-25 Thread Jens Rehsack

Am 25.11.2013 um 20:42 schrieb David E. Wheeler da...@justatheory.com:

 On Nov 25, 2013, at 11:08 AM, Jens Rehsack rehs...@gmail.com wrote:
 
 Let’s go - shoot:
 
   # specify most possible flags via driver flags
   $dbh = DBI-connect (dbi:CSV:, undef, undef, {
   f_schema = undef,
   f_dir= data,
   f_dir_search = [],
   f_ext= .csv/r,
   f_lock   = 2,
   f_encoding   = utf8,
 
   csv_eol  = \r\n,
   csv_sep_char = ,,
   csv_quote_char   = '',
   csv_escape_char  = '',
   csv_class= Text::CSV_XS,
   csv_null = 1,
   csv_tables   = {
   info = { f_file = info.csv }
   },
 
   RaiseError   = 1,
   PrintError   = 1,
   FetchHashKeyName = NAME_lc,
   }) or die $DBI::errstr;
 
 And keep in mind, csv_tables can be more complex and there’re other
 attributes like it eg. in DBD::AnyData.
 
 Well, how you would want to handle params would be up to you. No, they are 
 not great for hashes, but do-able.
 

 db:csv?f_dir=dataf_dir_search=foof_dir_search=barf_ext=.csv/rf_lock=2f_encoding=utf8csv_eol=%0D%0Acsv_sep_char=,csv_quote_char=%22csv_escape_char=%22csv+class=Text::CSV_XScsv_null=1RaiseError=1PrintError=1FetchHashKeyName=NAME_lccsv_tables=%7B%22info%22%3A%20%7B%22f_file%22%3A%22info_csv%22%7D%7D

Happy hacking, when you want type that on command line. DBD::CSV (and other
Pure-Perl drivers) is designed for flexibility and quick setup, not for
expensive configuration and long-term running.

 So yeah, one would need to do some sort of parsting of nested data (JSON in 
 the csv_tables example here), though arrays work okay (e.g., f_dir_search).
 
 OTOH, can you specify all this stuff in a DSN parseable by parse_dsn(), 
 either?

You can’t - this is a clear point in DBI::DBD::SqlEngine at the moment.

 But my point is not to replace the connect() API, but to create a standard of 
 sorts for representing connection info in a URL, to make it easier to specify 
 how to connect to things on the command-line. Yeah, if you want to do 
 everything, it will require more work, but that would be up to the code that 
 handles each driver, which is to say the URI subclass for a particular DBMS.

All I say to you: if you want DBI supporting that - keep in mind the
whole requirements. If you just want to do another DBI extension, with
limited coverage and usage - feel free.

 But this discussion orthogonal to my original questions, which were:
 
 * Should I use a hierarchical scheme like JDBC? I’m leaning heavily toward 
 this now, just need a decent prefix, I'm thinking dbms, e.g., 
 dbms:csv:foo“.

You can’t - because there is no „foo“ database for DBD::CSV at the moment.
Conceptual issue. In current state of development, those pure-perl DBMS
simulating drivers have no persistent metadata as mysql and postgresql have.

 * Do I have the metadata wrong for any of the DBMSes I have so far added 
 support for? Right now that’s just the ports they listen on by default:
 
 https://github.com/theory/uri-db/blob/master/t/db.t#L9

I think you miss my point - I don’t say that you’re wrong, I say you
lack a lot of requirements. When you don’t intend to support those
pure-perl drivers, you’re fine and please go ahead. When you intend
full DBI driver support, DBDI (https://github.com/timbunce/DBDI) could
enlighten you. Following that concept would allow you a migration to p6
one fine day ;)

Cheers
-- 
Jens Rehsack
rehs...@gmail.com







Re: RFI: Database URIs

2013-11-25 Thread H.Merijn Brand
On Tue, 26 Nov 2013 08:02:45 +0100, Jens Rehsack rehs...@gmail.com
wrote:

 
 Am 25.11.2013 um 20:42 schrieb David E. Wheeler da...@justatheory.com:
 
  On Nov 25, 2013, at 11:08 AM, Jens Rehsack rehs...@gmail.com wrote:
  
  Let’s go - shoot:
  
# specify most possible flags via driver flags
$dbh = DBI-connect (dbi:CSV:, undef, undef, {
f_schema = undef,
f_dir= data,
f_dir_search = [],
f_ext= .csv/r,
f_lock   = 2,
f_encoding   = utf8,
  
csv_eol  = \r\n,
csv_sep_char = ,,
csv_quote_char   = '',
csv_escape_char  = '',
csv_class= Text::CSV_XS,
csv_null = 1,
csv_tables   = {
info = { f_file = info.csv }
},
  
RaiseError   = 1,
PrintError   = 1,
FetchHashKeyName = NAME_lc,
}) or die $DBI::errstr;
  
  And keep in mind, csv_tables can be more complex and there’re other
  attributes like it eg. in DBD::AnyData.
  
  Well, how you would want to handle params would be up to you. No, they are 
  not great for hashes, but do-able.
  
 
  db:csv?f_dir=dataf_dir_search=foof_dir_search=barf_ext=.csv/rf_lock=2f_encoding=utf8csv_eol=%0D%0Acsv_sep_char=,csv_quote_char=%22csv_escape_char=%22csv+class=Text::CSV_XScsv_null=1RaiseError=1PrintError=1FetchHashKeyName=NAME_lccsv_tables=%7B%22info%22%3A%20%7B%22f_file%22%3A%22info_csv%22%7D%7D
 
 Happy hacking, when you want type that on command line. DBD::CSV (and other
 Pure-Perl drivers) is designed for flexibility and quick setup, not for
 expensive configuration and long-term running.
 
  So yeah, one would need to do some sort of parsting of nested data (JSON in 
  the csv_tables example here), though arrays work okay (e.g., f_dir_search).
  
  OTOH, can you specify all this stuff in a DSN parseable by parse_dsn(), 
  either?
 
 You can’t - this is a clear point in DBI::DBD::SqlEngine at the moment.
 
  But my point is not to replace the connect() API, but to create a standard 
  of sorts for representing connection info in a URL, to make it easier to 
  specify how to connect to things on the command-line. Yeah, if you want to 
  do everything, it will require more work, but that would be up to the code 
  that handles each driver, which is to say the URI subclass for a particular 
  DBMS.
 
 All I say to you: if you want DBI supporting that - keep in mind the
 whole requirements. If you just want to do another DBI extension, with
 limited coverage and usage - feel free.
 
  But this discussion orthogonal to my original questions, which were:
  
  * Should I use a hierarchical scheme like JDBC? I’m leaning heavily toward 
  this now, just need a decent prefix, I'm thinking dbms, e.g., 
  dbms:csv:foo“.
 
 You can’t - because there is no „foo“ database for DBD::CSV at the moment.
 Conceptual issue. In current state of development, those pure-perl DBMS
 simulating drivers have no persistent metadata as mysql and postgresql have.
 
  * Do I have the metadata wrong for any of the DBMSes I have so far added 
  support for? Right now that’s just the ports they listen on by default:
  
  https://github.com/theory/uri-db/blob/master/t/db.t#L9
 
 I think you miss my point - I don’t say that you’re wrong, I say you
 lack a lot of requirements. When you don’t intend to support those
 pure-perl drivers, you’re fine and please go ahead. When you intend
 full DBI driver support, DBDI (https://github.com/timbunce/DBDI) could
 enlighten you. Following that concept would allow you a migration to p6
 one fine day ;)

You are both right

ATM there are two ways to define the DSN:

1. Using DSN

   my $dbh = DBI-connect (dbi:Pg:dbname=foo;…);

2. Using driver + attributes

   my $dbh = DBI-connect (dbi:Pg:, $u, $p, { dbname = foo, … });

As I always use 2. when writing scripts, mostly because I use quite a
few useful attributes in the 4th argument already, it is the most
logical place: easy to maintain, easy to read and easy to extend.

One of the huge disadvantages of putting everything in the DSN is the
you have to rethink if and when character like :, ;, , and ' should be
escaped and how and that values like undef are pretty darn impossible
(unless the driver has taken the trouble to support that).

The first method however has a big advantage when the script (or a
one-liner) is completely driven by environment settings.

   my $dbh = DBI-connect ($ENV{DBI_DSN});

which happens to be the (underdocumented) default for connect and thus
equivalent to

   my $dbh = DBI-connect;

I don't know if having yet another way to define the DSN is going to
add a lot of trouble or a lot of pleasure. What I *do* know is that if
this will catch on, the DSN parts in DBI - including the docs - will be
better tested :)

-- 
H.Merijn Brand  http://tux.nl   Perl Monger  http://amsterdam.pm.org/
using perl5.00307 .. 5.19   porting perl5 on HP-UX, AIX, and 

Re: RFI: Database URIs

2013-11-23 Thread David E. Wheeler
On Nov 22, 2013, at 8:48 PM, Darren Duncan dar...@darrenduncan.net wrote:

 postgresql://user@localhost:5433/dbname
 sqlite:///path/to/foo.db
 
 By database name do you mean DBMS name?  Because I'd say the database 
 name is what's on the right-hand side of the //, not what's on the left.

Yes, correct.

 Another thing I was going to say is, if you wanted some standardization, you 
 should distinguish the parts that are necessary to connect to a database from 
 parts that just select a default schema in the database for interacting with.

Getting a bit OT from my original questions here. But anyway, I’m following the 
JDBC and PostgreSQL examples here, where, frankly, are almost identical to most 
other URLs. This is a very good thing.

 By that I mean, remember that a PostgreSQL database and a MySQL database 
 aren't actually the same concept.  A PostgreSQL DBMS server gives access to 
 multiple disjoint databases where you must name one to connect, and then 
 separately from that is the optional concept of the current schema that you 
 can select.  A MySQL DBMS server gives access to exactly 1 database, which 
 you can connect to without specifying a database name, and selecting a 
 current schema (what they call a database) is optional for using MySQL.

While I agree that a PostgresSQL database and a MySQL database are not the same 
thing, the MySQL community still calls it a database and tends to think of it 
that way, and JDBC has set the precedent for relying on what the community 
calls a “database” to be the “database” part of the URL:

  
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html

 The reason I say this is that DBI's uri scheme uses the same syntax for both 
 database even though how they're treated is actually very different.  And 
 you don't have to do the same.

I don’t have to, but that ship has long since sailed, so I will treat them the 
same as the DBI and JDBC.

Best,

David



RFI: Database URIs

2013-11-22 Thread David E. Wheeler
DBI Folks  Gisle,

I want to add support for specifying database connections as URIs to Sqitch, my 
DB change management system. I started working on it today, following the 
examples of JDBC and PostgreSQL. Before I release, though, I’d like a bit of 
feedback on a couple of things.

First, I'm using the database name as the scheme in the URL. Some examples:

postgresql://user@localhost:5433/dbname
sqlite:///path/to/foo.db

This is to make it easy to tell one DB from another. But I'm wondering if I 
should follow the example of JDBC more closely, and prepend db: or something 
to them. More like DBI DSNs, too. However, it would require a bit more work, as 
URI.pm does not currently recognize multiple colon-delimited strings as scheme 
names AFAICT. :-(

Next, I've added a bunch of URI subclasses for various database engines. I’m 
not to familiar with some of them, so if you see any listed here where the name 
(to be used in the scheme) and the default port is wrong, please let me know:

  https://github.com/theory/uri-db/blob/master/t/db.t#L9

Thanks!

David

PS: Is this something the DBI might want to use?



Re: RFI: Database URIs

2013-11-22 Thread Darren Duncan

On 2013.11.22 5:13 PM, David E. Wheeler wrote:

DBI Folks  Gisle,

I want to add support for specifying database connections as URIs to Sqitch, my 
DB change management system. I started working on it today, following the 
examples of JDBC and PostgreSQL. Before I release, though, I’d like a bit of 
feedback on a couple of things.

First, I'm using the database name as the scheme in the URL. Some examples:

 postgresql://user@localhost:5433/dbname
 sqlite:///path/to/foo.db

This is to make it easy to tell one DB from another. But I'm wondering if I should follow 
the example of JDBC more closely, and prepend db: or something to them. More 
like DBI DSNs, too. However, it would require a bit more work, as URI.pm does not 
currently recognize multiple colon-delimited strings as scheme names AFAICT. :-(

Next, I've added a bunch of URI subclasses for various database engines. I’m 
not to familiar with some of them, so if you see any listed here where the name 
(to be used in the scheme) and the default port is wrong, please let me know:

   https://github.com/theory/uri-db/blob/master/t/db.t#L9

Thanks!

David

PS: Is this something the DBI might want to use?






Re: RFI: Database URIs

2013-11-22 Thread Darren Duncan

Sorry, I think I hit send without adding a reply message; here it is.

On 2013.11.22 5:13 PM, David E. Wheeler wrote:

First, I'm using the database name as the scheme in the URL. Some examples:

 postgresql://user@localhost:5433/dbname
 sqlite:///path/to/foo.db


By database name do you mean DBMS name?  Because I'd say the database name 
is what's on the right-hand side of the //, not what's on the left.


Another thing I was going to say is, if you wanted some standardization, you 
should distinguish the parts that are necessary to connect to a database from 
parts that just select a default schema in the database for interacting with.


By that I mean, remember that a PostgreSQL database and a MySQL database 
aren't actually the same concept.  A PostgreSQL DBMS server gives access to 
multiple disjoint databases where you must name one to connect, and then 
separately from that is the optional concept of the current schema that you can 
select.  A MySQL DBMS server gives access to exactly 1 database, which you can 
connect to without specifying a database name, and selecting a current schema 
(what they call a database) is optional for using MySQL.


The reason I say this is that DBI's uri scheme uses the same syntax for both 
database even though how they're treated is actually very different.  And you 
don't have to do the same.


-- Darren Duncan