Enhancements to Pure-Perl DBD's (primarily DBD::File)

Jens Rehsack Mon, 17 Sep 2012 11:30:48 -0700

Hi there,

today I got some people in irc://irc.perl.org/#dbi to discuss planned
extension to DBD::File.


There were several goals to reach:
1) add readonly mode to DBI::DBD::SqlEngine
   ==> will be solved by adding a new attribute sql_readonly to
       DBI::DBD::SqlEngine and if required fix DBI::SQL::Nano and
       SQL::Statement and bump SQL::Statement requirement in *::Nano
2) add support for other I/O layers to DBD::File
   a) add support for streams (PerlIO)
   b) add support for other kind of fetch_row/push_row processing

   The goal (a) came from recent projects and the goal (b) is a
   longer open one I identified first 2 years ago when refactoring
   DBD::File to it's current version. During this refactoring task
   I detected to complexity of dealing with DBD::AnyData.

   Sven Dowideit is now caring of AnyData and DBD::AnyData and run
   into the same problem.

   Proposed solution:

   DBD::File will implement an abstract i/o strategy which will
   access concrete implementations for directory scanning
   ($dbh->get_tables()) and table's I/O (fetch_row, push_row, ...)

   ==> DBD::File will get two new (default-) attributes:
       * f_dir_backend (or: f_dir_strategy)
       * f_stream_backend (or: f_stream_strategy)

   ==> backends will provide required methods ('perlio :via',
       get_line, tell, seek, ...) for data parsers.

   ==> Data parsers might have requirements not satisfied by every
       backend (thing of DBD::DBM *gg*)

   Additional improvements of this way:

   ( ) We could easily re-implement DBD::RAM (provides RAM tables)
   ( ) We could do groundwork for future wanted $dbh's mixing
       CSV tables with DBM tables with AnyData tables with Sys tables
       ...
   ( ) We can add a "clone" of DBD::ExampleP doing what it does using
       default provided Dir/Stream backends

   I would prefer to add default backends below DBD::File namespace,
   eg. DBD::File::Backend::PerlIO or DBD::File::Backend::Filesystem.

Any comments?

Best regards,
Jens

<@[Sno]> well, I invited mst to discuss the stream support/API for DBD::File 
(and DBD::CSV ...) for f_file and f_dir
<@Tux> guess it was caused by the number of netsplits last week
<@Tux> I'll try to keep alive when I can. must visit cust tomorrow
<@[Sno]> what's already in queue: impove DBI::DBD::SqlEngine with an attribute 
named 'sql_readonly' ==> throw exception when open_table is called for write 
access
<@Tux> DBD_File also has f_readonly causing early fail om open for writing, but 
that has not (yet) been committed
<@Tux> I have been playing with the next step, but ran out of time
<@[Sno]> Tux: I move that f_readonly to SqlEngine
<@[Sno]> because we can support more Pure-perl DBD's in that way
<@Tux> then it won't work in nano
<@[Sno]> and it's not released
<@[Sno]> why it shouldn't work in Nano?
<@[Sno]> Nano is bundled and I just have to fix it there
<@Tux> and i/we have to change dbd::csv to recognize sql_** as valid options too
<@Tux> "fix it there" is ok for me
<@[Sno]> doesn't it already? we have a lot of sql_* options already active
<@Tux> I would have to check
<@[Sno]> I expect it does - I use them in SQL::Statement tests :D
<@[Sno]> eq. sql_quoted_identifier_case and sql_identifier_case
<@[Sno]> try set $dbh->{sql_identifier_case} = SQL_IC_UPPER and see what 
happens 
<@[Sno]> ok, back to stream (or like) support
<@[Sno]> currently the simplest way would be to check with Scalar::Util 
(isvstring, reftype) or something like that if the given f_file attribute is a 
string or a file handle
<@[Sno]> same for f_dir
<@[Sno]> but this is probably a bit short from several perspectives
<@[Sno]> 1) maybe we want to set the f_dir attribute to a tar archive (or an 
Archive::Tar instance?) and the tables to open shall come from the tar archive 
(readonly, of course)
<@[Sno]> 2) SvenDowideit now works on AnyData / DBD::AnyData which supports 
more storage backends than simple files
<@[Sno]> 3) DBD::DBM could be improved by not hacking around the open() call in 
DBD::File
* SineSwiper (~sinesw...@74-130-25-242.dhcp.insightbb.com) has joined #dbi
<@[Sno]> OO programmers hammer here seems to cry: use roles
<@[Sno]> but timbunce_ will say: no additional dependencies for DBI (and I 
agree to this statement) which means, role management has to be implemented by 
hand
<@[Sno]> Tux, mst, SvenDowideit - did I cover it so far?
<SvenDowideit> are you guys only thinking about using archives as input?
<@Tux> a) I don't want *any* new dep in DBD::File (that would add deps to DBI)
<@Tux> so we'd need a backward compat way to pass a "stream" to DBD::File
<SvenDowideit> i find the idea of sql querying data that comes in on a tcp 
stream (for eg)
<@Tux> what Sno and I thought of was to set f_dir to undef and pass the 
"stream" in f_file or somesuch
<@Tux> That was what I worked on, but I found a snakepit
<@[Sno]> Tux: that was our first quick shot
<@[Sno]> the idea with Archive::Tar and Archive::Zip came later
<SvenDowideit> but wrt archive, its complicated when you have a zip of dirs and 
files - though I like that too
<@[Sno]> and then came SvenDowideit - and he's doing it again ;)
<SvenDowideit> grin, i'm 95% trouble
<@Tux> Archive::* should be used from DBD::CSV and other backends
<@[Sno]> but better trouble now than when finished
<@Tux> sure
<@[Sno]> I don't want a reimplementation of I/O backends in any derived DBD!
* SineSwiper1 (~sinesw...@gw.insightns.com) has joined #dbi
<@Tux> the only diff in DBD::File is that in the guts it doe not need to *open* 
a file, but just use the stream passed 
* SineSwiper has quit (Ping timeout: 360 seconds)
<@[Sno]> for f_file neing a stream, yes
<@Tux> which proved harder to do than I thought it would be
<@[Sno]> but for f_dir being something different?
<@[Sno]> which is business as usual for SvenDowideit in AnyData
<SvenDowideit> darnit :)
<@[Sno]> SvenDowideit: I do not have objections when an I/O role provides 
abilities to write data to anything ...
<SvenDowideit> tbh - defining a demarkation between DBD::File stuff, and 
DBD::Other would be wise
<SvenDowideit> that will stop someone like me bikeshedding
<@[Sno]> SvenDowideit: DBD::File is an abstract base class which handles basic 
I/O for derived DBD's like DBD::CSV, DBD::DBM etc.
<SvenDowideit> DBD::other - like DBD::Dir
<SvenDowideit> or DBD::TCPStream
<SvenDowideit> or DBD::Stream
<@[Sno]> and it looks to me it should become a wrapper for "external" I/O 
implementations and 1..n default implementations for string and fhandle
<@[Sno]> SvenDowideit: not 10 additional DBD::* base classes like in g_object - 
that sucks
<@[Sno]> billions cross dependencies and in the end you need them all
<SvenDowideit> you want to implement all those io types in one magical place?
<@[Sno]> I want a DBD::File - as now
<@[Sno]> that's required for backward compatibility anyway
<@[Sno]> I might be open for a discussion about a class between 
DBI::DBD::SqlEngine and DBD::File which is more complex and DBD::File 
intruments that class to behave as it's doing now
<@[Sno]> but that would restrict benefits to DBD::CSV
<@[Sno]> so I'd prefer a DBD::File with some more attributes (f_dir_backend, 
f_stream_backend, ...)
<@[Sno]> and instead of doing an opendir() - it calls it's 
$self->{dir_backend}->open()
<@[Sno]> similar for f_file, open and f_stream_backend
<SvenDowideit> gotcha
<SvenDowideit> roles, but homemade
<@[Sno]> that's why I asked mst to join - he might have ideas ... ;)
<@[Sno]> gang of four named that "facede pattern" or so
<SvenDowideit> facade - gosh, whats the german
<@[Sno]> :P
<@[Sno]> anyway - typical roles won't work either, 'cause they're injected once 
and we need it on $dbh/$table instance
<SvenDowideit> ok, so as I'm reading code still (well done indeed)
<@[Sno]> Tux: what do you think about those f_dir_strategy / f_stream_strategy 
attributes and they're instantiated with some DBD::File intelligence as *::File 
(opendir, open, tell, seek, ...) or *::Stream
<SvenDowideit> can you explain why one might need a different dir_backend from 
file_backend for any one dbh/table ?
<@[Sno]> to use Archive::Tar ...
<@[Sno]> SvenDowideit: dir_backend is currently per $dbh 
<SvenDowideit> if i'm using Archive::Tar, then one class that gives both dir 
and file info would work right?
<@[Sno]> we planned a future for DBD::File where it will be possible to mix 
between DBD::File and DBD::DBM etc.
<@[Sno]> SvenDowideit: probably - depends on finally decided implementation
<@[Sno]> but the dir_backend can return a default_stream_backend - and then: yes
<@Tux> mje: http://pasta.test-smoke.org/329
<SvenDowideit> mmm, ok, there's a point, if you have 2 zips, you need a way to 
combine the streams
<@mst> $dbh has some sort of object that returns an object representing a table 
perhaps?
<@Tux> Sno, sane up to tell/seek, as those prove extremely unreliable in XS
<@mje> Tux, I've tried asking Yanick to fix that TODO test - I will try again - 
thanks again
<@Tux> as you might have no idea what the underlying mechanism is: perlio, 
standard IO, scalario etc etc
<@[Sno]> mst: yes, the dbh has that sort of object
<@Tux> I tried really hard to fix that in Text::CSV_XS but after looking deeper 
with leont, I gave up and reverted
<@[Sno]> so anyone would be happy with f_dir_strategy/f_dir_backend and 
f_stream_strategy/f_stream_backend?
<SvenDowideit> mmm, so these backends are basically parallel to something Jeff 
was doing in AnyData - though he only began to extract the code to show it
* SineSwiper1 has quit (Ping timeout: 360 seconds)
<@[Sno]> 'cause $dbh->get_tables() is not restricted to $sth it might be 
difficult (not clever?) to have a dir backend per table
<SvenDowideit> and then he separated it further to make the parser of the 
f_stream_backend pluggable
<@[Sno]> SvenDowideit: yes, that's the intention
<SvenDowideit> excellent, that addresses my niggling feeling that AnyData 
should be redundant
<@[Sno]> SvenDowideit: probably it provides exactly the missing backends
<SvenDowideit> only in a proto-mess form
<@[Sno]> or it will be a bundle of separate available backend
<@[Sno]> SvenDowideit: see 
http://search.cpan.org/~timb/DBI-1.622/lib/DBD/File.pm#f_schema how 
f_dir_backend would be added like f_dir or similar
<SvenDowideit> so, to take things to a maddening extreme
<SvenDowideit> what should happen when i point DBD::File at a dir containing 12 
zips
<SvenDowideit> that each contain a mix of cvs, and other file types
<@[Sno]> that what happens now, too (why using the default backend)
<@[Sno]> s/why/when/
<SvenDowideit> when thinking about the future when you can mix DBD:File DBD:DBM 
and more
<@[Sno]> see f_ext for details ;)
<SvenDowideit> i'm thinking in the bold future when the facades allow magic
<@[Sno]> future is unwritten - we though about some kind of data dictionary
<@[Sno]> and of course, you're right, DBD's would be "reduced" to configuration 
providers
<SvenDowideit> i'm kind of wondering if having DBD::File contain the dir code 
is suboptimal
<SvenDowideit> compared to having a DBD::Dir that you mix with a DBD::File and 
a DBD::Parser (for want of a better name)
<@[Sno]> I don't get your point
<SvenDowideit> ignore that - I'm hung up on the name of the class
<SvenDowideit> i keep thinking that DBD::File is about files, and really, its 
not (just)
<@[Sno]> not anymore ;)
<SvenDowideit> and the 2 strategies provides that separation - it aught to be 
possible for a f_dir_backend to return either another f_dir_backend or a 
f_stream
<SvenDowideit> if it has the smarts to do it (which would be unlikely to be 
coded in the DBI core backends, except for trivial cases)
<@[Sno]> another f_dir_backend? what should be done with that other instance?
<@[Sno]> I thought about both - as the table objects are now - as a flyweight
<SvenDowideit> keep iterating until you get nothing, or something useful
<@[Sno]> like DBD::ExampleP ?
* SvenDowideit goes look :)
* apatsche (~dpatschei@81.16.159.162) has joined #dbi
<SvenDowideit> dunno :) there's no docco! :p
<@[Sno]> Tux: can you re-read mst's statement with the idea of multiple table 
types in one $dbh?
* apatsche (~dpatschei@81.16.159.162) has left #dbi (Konversation terminated!)
<@[Sno]> Tux: in preparation of data dictionary
<@Tux> I see only one line from mst and that does not deal with your question
<@[Sno]> Tux: tables are Flyweights meanwhile (it's all hold in shared f_meta 
structure)
<@[Sno]> yeah - but it gave me an idea :D
<@Tux> I like the opportunity of having a mixed env: no restrictions to what 
/might/ be useful, but withing doable bounds
<@[Sno]> Tux: instead of $class =~ s/::Statement/::Table/; we could use 
f_meta->class //= ...
<@Tux> yes
<@[Sno]> which would allow basic DBD mixing ...
<@[Sno]> we need to do more - because of mixing "dbm_*" and "csv_*" attributes 
in $dbh
<@[Sno]> timbunce_: any wishes about the namespace for the default 
f_dir/f_stream backends of DBD::File?
<SvenDowideit> ooo, you mean you're going to implement it?
<@[Sno]> not today, probably tomorrow or Wednesday
<@timbunce_> [Sno] I've not been following along. I'd need a summary and I'm 
heading out for a while now. So I'll pull my usual trick of asking for an email 
to dbi-dev :)
<@[Sno]> hehe
<@[Sno]> I would attache the chat log if noone rejects
<@timbunce_> Edited heavily *please*!
<@[Sno]> mst: you're dismissed - you can reduce the amount of your channels
<@[Sno]> timbunce_: I write a summary - attaching it for those who want's to 
know all details
<@timbunce_> cool, thanks.

Enhancements to Pure-Perl DBD's (primarily DBD::File)

Reply via email to