Re: Question about TableFunctions in Derby

Chris Goodacre Mon, 20 Jul 2009 11:39:19 -0700

Rick, thanks for your suggestions.   Perhaps I am being obtuse, but when you 
say ...

"Since you have only asked for 3 columns, that's all that Derby will
request from the ResultSet instantiated by your table function. That
is, Derby is only going to call ResultSet.getXXX() on the house_number,
street, and city columns. That should behave efficiently provided that
your ResultSet is smart enough to only fault-in columns for which a
getXXX() is called."

Does that mean that I make a separate request to the legacy system each time 
getXXX() is called - i.e. lazily initialize each column in the result set?    I 
think this has to be the only way to do it, since I don't know which columns 
will be requested at the time the read() method of my tablefunction is invoked. 

Making (in this case) 3 calls to the legacy system to get 1 column for N rows 
is certainly better than making 1 call to the legacy system to get 1000 columns 
for N rows and then throwing away 997*N values/cells, but still not quite as 
nice as I'd like.  

If I were making a wish - I'd wish for some sort of parsed representation of 
the query get passed to the read method (or to some other method - similar to, 
or even as part of, the query optimization interface).    Ideally, this 
structured representation would have the list of columns belonging to the table 
function from the select list, and the where clause components specific to the 
table function only (i.e. mytablefunction.price > 50000 but NOT 
mytablefunction.price < myrealtable.value).

In the absence of this, when the VTIResultSet class passes the ActivationHolder 
to the derby class which invokes the read() method reflectively, why can't that 
class pass the activation context (it knows it is dealing with a derby table 
function, it knows the class name, it has access to the result set descriptor, 
if not the where clause) pass this information along to the user's table 
function class?   I would happily implement an interface in this class (not 
sure why read() has to be static) to get this information prior to resultset 
instantiation.

-chris

----- Original Message ----
From: Rick Hillegas <[email protected]>
To: Derby Discussion <[email protected]>
Sent: Monday, July 20, 2009 10:55:33 AM
Subject: Re: Question about TableFunctions in Derby

Hi Chris,

Some comments inline...

Chris Goodacre wrote:
> I've read the Derby developer's guide and Rick Hillegas's informative white 
> paper 
> (http://developers.sun.com/javadb/reference/whitepapers/sampleTableFunctions/doc/TableFunctionsWhitePaper.html)
>  on Table Functions, but am still struggling with the following issue:
> 
> I am trying to create an RDB abstraction for a large CICS/VSAM-based legacy 
> system and blend it with our newer, RDB-based tier.  This seems like a good 
> application of TableFunctions.  The VSAM data is made available to me via an 
> IP-based proprietary messaging interface.  There are lots of different files 
> here, but due to some historical forces, most of the data I'm interested in 
> resides in 4 VSAM files.
> 
> Unfortunately, each of those VSAM files has over a 1000 fields in it.
> 
> Now eventually, it might be possible to fully model a single VSAM file into 
> (for the sake of argument) 50 tables; each table/row representing a small 
> slice of a single VSAM record.
> 
> In the meantime, for both this proof-of-concept and as a migration path to 
> our existing clients, I'd like to represent each VSAM file as a table 
> (subject to the 1024 column SQL limitation per table).  This will be a 
> highly-denormalized and decidedly non-relational view of the data, but it 
> will be easy to demonstrate and immediately recognizable to our customers.
> 
> However, I can't seem to get around the problem of data granularity.  
> For example, if my customer executes:
> 
> select house_number, street, city from table (legacy_realty_data()) where 
> price < 500000
>  
Since you have only asked for 3 columns, that's all that Derby will request 
from the ResultSet instantiated by your table function. That is, Derby is only 
going to call ResultSet.getXXX() on the house_number, street, and city columns. 
That should behave efficiently provided that your ResultSet is smart enough to 
only fault-in columns for which a getXXX() is called.

The WHERE clause is a little trickier. You are right, Derby will read all rows 
from the ResultSet and throw away the rows which don't satisfy the WHERE 
clause. What you want to do is push the qualification through the table 
function to the external data source. I don't see any way to do this other than 
adding some more arguments to your table function. For instance, if you could 
push the qualification through to the external data source, then you could get 
efficient behavior from something like the following:

select house_number, street, city
from table( legacy_realty_data( 500000 ) ) s;

Hope this helps,
-Rick

> I don't appear to have any visibility to the actual query inside my 
> legacy_realty_data TableFunction, so I have to go get all 1000 fields for 
> however many listings are present where price< 500000 even though only three 
> columns will be requested.  Am I missing something?  Aside from having the 
> user repeat the columns as parameters to the table function (which looks 
> awkward to say the least), I can't see a way around this based on my limited 
> knowledge of Derby.
> 
> Is there a way to only retrieve the columns that the user is querying for?
> 
> Looking forward to your help/advice.
> 
> -chris
> 
>

Re: Question about TableFunctions in Derby

Reply via email to