[ 
https://issues.apache.org/jira/browse/TRAFODION-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Zeller resolved TRAFODION-51.
----------------------------------
       Resolution: Implemented
    Fix Version/s:     (was: 1.1 (pre-incubation))
                   2.0-incubating

This was function-complete with TRAFODION-38.

> LP Blueprint: cmp-tmudf-compile-time-interface - Complete work for TMUDF 
> compile time interface
> -----------------------------------------------------------------------------------------------
>
>                 Key: TRAFODION-51
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-51
>             Project: Apache Trafodion
>          Issue Type: New Feature
>          Components: sql-cmp
>            Reporter: Hans Zeller
>            Assignee: Hans Zeller
>             Fix For: 2.0-incubating
>
>
> Suresh and others implemented much of a compile time interface for TMUDFs 
> (Table-Mapping UDFs). Such an interface allows a TMUDF to be polymorphic 
> (input and output columns decided at compile time, not at create time) and to 
> do optimizations like elimination of unneeded columns, pushing predicates 
> below or into the TMUDF, getting better cardinality and cost estimates, and 
> using sort order and partitioning of input tables.
> A TMUDF can have zero or more input tables. Using more than one input table 
> is not tested and
> supported at the moment, but the design should allow it.
> The interface will be a C++ class that the TMUDF writer can derive from. 
> Implementing a compiler
> interface for a TMUDF is completely optional and is done by overriding 
> virtual methods of the
> default implementation. In a later step, we also want to replace the C 
> interface
> at runtime with this C++ interface, designed a few years ago. We should be 
> able to define a
> Java compile time interface that's fairly similar to the one in C++. A TMUDF 
> writer only needs to
> override those interfaces that need to be different from the default 
> implementation. For example,
> a TMUDF could define its output columns through the compiler interface, but 
> it might not
> support pushing predicates into the TMUDF. Another example could be a TMUDF 
> that only
> implements the compiler interface that determines the degree of parallelism.
> There are three main classes in this interface, all are defined in file 
> core/sql/sqludr/sqludr.h:
> UDRInvocationInfo: This is similar to SQLUDR_TMUDFINFO in the C interface. It 
> describes
> the metadata of the TMUDF, scalar input parameters, the table-valued result, 
> PARTITION BY
> and ORDER BY clauses specified for table-valued inputs, etc. There is one of 
> these for every
> TMUDF invocation in a query. In some cases, the compiler may create additional
> UDRInvocationInfo objects when it transforms the TMUDF, for example by 
> placing it under
> a nested join, with a different set of predicates to be pushed down. There 
> are additional
> classes to describe parameters, table-valued inputs and outputs, data types, 
> similar to
> the existing C interface.
> UDRPlanInfo: There are zero or more UDRPlanInfo objects for every 
> UDRInvocationInfo
> object. The optimizer creates one for every optimization goal (context) where 
> it needs to
> call the TMUDF interface.
> TMUDRInterface: This class represents the code associated with a TMUDF. The 
> class itself
> represents the default behavior of a TMUDF without a compile time interface. 
> UDF writers
> can define a derived class and implement virtual methods to customize the 
> optimizer
> interface. Trafodion tries to find a C function
> <UDF external function name>_CreateCompilerInterfaceObject
> If that function exists in the UDF library, it is assumed to return an object 
> of a class
> derived from TMUDRInterface, and the compiler will call the virtual methods, 
> some
> could be defined in the derived class, some could be in the base class, and 
> the derived
> class also might call the base class method to do part of the work.
> Here are the methods we plan to support:
> - Validate scalar input parameters, possibly allow those parameters to 
> deviate from the
>   parameter list declared at DDL time.
> - Allow the compiler interface to look at constant values that are passed in 
> as input
>   parameters.
> - Define the table-valued result columns, based on scalar parameters and 
> column
>   layout of the input (child) table(s).
> - Eliminate unneeded columns from the TMUDF result and also from the input 
> tables.
> - Allow predicates to be pushed down through he TMUDF operator to the child 
> table(s).
> - Allow predicates to be absorbed into the TMUDF.
> - Return a cost estimate of the TMUDF, based on information available at 
> compile time.
> - Influence the degree of parallelism chosen for the TMUDF.
> - Make use of natural partitioning and sort order of input (child) tables to 
> produce
>   partitioned and sorted results.
> - Gather the necessary information based on the compile time interaction that 
> is
>   needed at runtime.
> - At any time in the process, the compile time interface can raise an 
> exception. If
>   it does, the compilation will fail and an error message provided by the 
> TMUDF
>   writer will be returned in the diagnostics area.
> Some more design choices:
> The C++ interface uses its own C++ namespace, to avoid naming collisions and
> to make it look more similar to Java. Objects are allocated on the system 
> heap and
> are deleted after statement compilation is finished. The interface does not 
> use any
> of the Trafodion objects like NAHeap, ComDiagsArea, etc. The interface does 
> use
> C++ STL for strings and collection templates, again with the goal to stay 
> close
> to Java.
> NAString ==> std::string
> NAHeap ==> C++ system heap
> ComDiagsArea ==> Throw exception with an attached SQLSTATE and error message
> In this first implementation, we only support a C++ interface and the 
> compiler will
> call that interface directly, without going through the tdm_udrserv process. 
> We may
> need a special privilege to allow a user to define code that's executed in the
> Trafodion process (the privilege to create a TMUDF that has a compile time 
> interface).
> In the longer term we hope to support the following flavors:
> - C++ and Java interfaces for the TMUDF, both at compile time and run time.
> - Trusted and isolated modes, both at compile and run time.
> Example code for a "sessionize" TMUDF that expects a single table input and
> passes all columns through to the result, in addition to the session id column
> (assume that session id column is defined as the only output column in the 
> DDL):
> class SessionizeUDFInterface : public TMUDRInterface
> {
>   // override any methods where the UDF author would
>   // like to change the default behavior
>   void describeParamsAndColumns(UDRInvocationInfo &info);
> };
> void SessionizeUDFInterface::describeParamsAndColumns(
>      UDRInvocationInfo &info)
> {
>       // sessionize is intended to work with a single input table
>       if (info.getNumTableInputs() != 1)
>         throw UDRException(38001,
>                            "Expecting one table-valued input, got %d",
>                            info.getNumTableInputs());
>       // add all input table columns as output columns
>       info.addPassThruColumns(0);
> }
> extern "C" TMUDRInterface * SESSIONIZE_CreateCompilerInterfaceObject(
>      const UDRInvocationInfo *info)
> {
>   return new SessionizeUDFInterface();
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to