Hi, I have the requirement to read lobs via DBD::ODBC in chunks but there is no current way of doing this. I see DBI defines blob_read something like this:
blob_read => { U =>[4,5,'$field, $offset, $len [, \\$buf [,$bufoffset]]'] } which I'm guessing means read the lob at column $field from offset $offset for $len bytes into $buf and if $bufoffset is specified read it into $buf at this point. I'm presuming the $bufoffset is there to support reading the lob in chunks and appending to $buf? However I cannot find another DBD which actually uses it, neither can I find any documentation on it e.g., DBD::Oracle has ora_lob_read and DBD::Pg has pg_lo_read which do not use blob_read. I also have a minor issue with it being called blob_read and not lob_read as my requirement is actually to read large objects which are strings not binary objects. Some background: There are a couple of peculiarities in ODBC: 1. DBD::ODBC currently binds columns in ODBC using SQLBindCol. DBD::ODBC allocates enough space for each column (or LongReadLen), binds the column as the defaulted or requested type then calls SQLFetch - all the columns for a row are retrieved in that one call in this case. DBD::ODBC does not currently use SQLGetData which allows the caller to retrieve column data in chunks. However, the SQLGetData interface does not specifically tell you that you have retrieved all of the column, it works by filling your buffer each time and telling you how many bytes it could have returned (if you had passed a bigger buffer) returning SQL_SUCCESS_WITH_INFO and on incomplete call to SQLGetData and it returns SQL_SUCCESS and the bytes returned will be equal or less than your buffer length. Fortunately, it does allow another call to SQLGetData after all of the column is retrieved which returns SQL_NO_DATA so you can detect the end other than testing the returned bytes compared with the requested bytes. 2. you can only read lobs via SQLGetData from the start to the end (or part way through) and you cannot jump in at an offset - of course, a DBD could throw away the unwanted bits. 3. some databases do not allow the reading of columns via SQLGetData out of order when other columns are bound (e.g., MS SQL Server) i.e., you can bind columns 1-5 then use SQLGetData for column 6 but you cannot bind columns 1,2,4,5 and use SQLGetData to read column 3 - as that would be out of order. 4. you cannot part read column N with SQLGetData then switch to read part of column N+1 - you have to read all/part of column N then do column N+1 (and so on) and cannot got back to column N. In addition, the $bufoffset argument to DBI's blob_read causes me a problem since I don't know if we are using character lengths or byte lengths. Some explanation is required. ODBC does not really do UTF8 encoding but Perl does. ODBC has SQLWxxx functions which take and return Wide characters (UTF16 encoded or UCS2 encoded). When DBD::ODBC is built with unicode support (mostly on Windows but an increasing number of people on UNIX are using it too) it calls the SQLWxxx functions or binds columns as SQL_WCHAR and converts the returned data to UTF8 encoding in the bound Perl scalars (which can be bigger than the supplied buffer length as UTF-8 encoding can be as much as 6 * chrs). Obviously if you call blob_read with a length of N and if it means N bytes then I have a problem since there is no way of knowing what the returned wide chrs will translate into upto 6 * bytes UTF8 encoded. Calling blob_read with a buffer length of 1000 could require 1000/2*6=3000 bytes to store the UTF-8 encoded result. I appreciate that the $bufoffset in blob_read allows someone to retrieve lobs in chunks and have them appended to the current scalar but that can also be achieved by concatenation. Also, blob_read does not allow a type to be specified. This is perhaps not important if blob_read is JUST for BLOBs and not CLOBS but my case is mostly clobs. In ODBC, if you request a blob as a SQL_CHAR you get the blob data returned as hex encoded bytes so chr(32) becomes 0x20 (4 bytes as a string). As a result of the above, I don't see any way of supporting the existing blob_read in DBI and as it is undocumented anyway and no DBDs seem to use it I've put a test implementation together as: odbc_lob_read(sth, colno, buf, length, \%attrs) where \%attrs can contain TYPE => SQL_type. By default the column is retrieved as it is defined in the db (if the column is a binary type), as SQL_WCHAR (wide chrs translated into UTF-8 encoding in Perl) if a unicode build and SQL_CHAR otherwise. You can override the type in %attrs. If you build with unicode support then length means size of buffer in bytes and this is the buffer supplied to SQLGetData so it means at most length/2 chrs and if length/2 chrs * UTF8 encoding is greater than length then the buf scalar is grown i.e., in the unicode build length means the length in chrs and it might return more bytes than length specified. It currently does not support retrieving a chunk of a lob which is added/concatenated to an existing scalar - part of the reason for this is that there is nothing to stop someone calling lob_read with different types as you retrieve chunks of the column and column data cannot be both UTF-8 encoded and not UTF-8 encoded - it can actually but it does not make sense to me for it to be. I appreciate this RFC is quite long and I'm guessing few people will get to the end. Increasingly, I'm working with several DBDs and the differences between them (beyond SQL differences) is proving VERY frustrating. It would be nice to try and sort out lob reading and whilst I'm on a roll, SQLMoreResults/more_results, last_insert_id, lob_write, binding with a specific type etc but I guess since some of these have been covered before and we didn't get too far in the past that perhaps there is not a lot of interest in it. I'd like to fit DBD::ODBC with DBI's blob_read but I don't think it works for DBD::ODBC as it stands and I note other DBDs have provided their own lob reading methods substantially different so I'm guessing they found that too. With Perl 6 perhaps on the horizon and Tim's current interest in JDBC I think we are well advised to sort some of this out in Perl 5 first. It makes little difference to me right now as I'll implement what I need now and move on but I'm prepared to work to a DBI interface for the common goal. Some of this is not going to go away - you only have to watch perl monks to see the frustration with last_insert_id, lob reading, binding columns with types etc across DBDs to know this is proving troublesome for the average user. Martin -- Martin J. Evans Easysoft Limited http://www.easysoft.com