[
https://issues.apache.org/jira/browse/TRAFODION-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hans Zeller resolved TRAFODION-51.
----------------------------------
Resolution: Implemented
Fix Version/s: (was: 1.1 (pre-incubation))
2.0-incubating
This was function-complete with TRAFODION-38.
> LP Blueprint: cmp-tmudf-compile-time-interface - Complete work for TMUDF
> compile time interface
> -----------------------------------------------------------------------------------------------
>
> Key: TRAFODION-51
> URL: https://issues.apache.org/jira/browse/TRAFODION-51
> Project: Apache Trafodion
> Issue Type: New Feature
> Components: sql-cmp
> Reporter: Hans Zeller
> Assignee: Hans Zeller
> Fix For: 2.0-incubating
>
>
> Suresh and others implemented much of a compile time interface for TMUDFs
> (Table-Mapping UDFs). Such an interface allows a TMUDF to be polymorphic
> (input and output columns decided at compile time, not at create time) and to
> do optimizations like elimination of unneeded columns, pushing predicates
> below or into the TMUDF, getting better cardinality and cost estimates, and
> using sort order and partitioning of input tables.
> A TMUDF can have zero or more input tables. Using more than one input table
> is not tested and
> supported at the moment, but the design should allow it.
> The interface will be a C++ class that the TMUDF writer can derive from.
> Implementing a compiler
> interface for a TMUDF is completely optional and is done by overriding
> virtual methods of the
> default implementation. In a later step, we also want to replace the C
> interface
> at runtime with this C++ interface, designed a few years ago. We should be
> able to define a
> Java compile time interface that's fairly similar to the one in C++. A TMUDF
> writer only needs to
> override those interfaces that need to be different from the default
> implementation. For example,
> a TMUDF could define its output columns through the compiler interface, but
> it might not
> support pushing predicates into the TMUDF. Another example could be a TMUDF
> that only
> implements the compiler interface that determines the degree of parallelism.
> There are three main classes in this interface, all are defined in file
> core/sql/sqludr/sqludr.h:
> UDRInvocationInfo: This is similar to SQLUDR_TMUDFINFO in the C interface. It
> describes
> the metadata of the TMUDF, scalar input parameters, the table-valued result,
> PARTITION BY
> and ORDER BY clauses specified for table-valued inputs, etc. There is one of
> these for every
> TMUDF invocation in a query. In some cases, the compiler may create additional
> UDRInvocationInfo objects when it transforms the TMUDF, for example by
> placing it under
> a nested join, with a different set of predicates to be pushed down. There
> are additional
> classes to describe parameters, table-valued inputs and outputs, data types,
> similar to
> the existing C interface.
> UDRPlanInfo: There are zero or more UDRPlanInfo objects for every
> UDRInvocationInfo
> object. The optimizer creates one for every optimization goal (context) where
> it needs to
> call the TMUDF interface.
> TMUDRInterface: This class represents the code associated with a TMUDF. The
> class itself
> represents the default behavior of a TMUDF without a compile time interface.
> UDF writers
> can define a derived class and implement virtual methods to customize the
> optimizer
> interface. Trafodion tries to find a C function
> <UDF external function name>_CreateCompilerInterfaceObject
> If that function exists in the UDF library, it is assumed to return an object
> of a class
> derived from TMUDRInterface, and the compiler will call the virtual methods,
> some
> could be defined in the derived class, some could be in the base class, and
> the derived
> class also might call the base class method to do part of the work.
> Here are the methods we plan to support:
> - Validate scalar input parameters, possibly allow those parameters to
> deviate from the
> parameter list declared at DDL time.
> - Allow the compiler interface to look at constant values that are passed in
> as input
> parameters.
> - Define the table-valued result columns, based on scalar parameters and
> column
> layout of the input (child) table(s).
> - Eliminate unneeded columns from the TMUDF result and also from the input
> tables.
> - Allow predicates to be pushed down through he TMUDF operator to the child
> table(s).
> - Allow predicates to be absorbed into the TMUDF.
> - Return a cost estimate of the TMUDF, based on information available at
> compile time.
> - Influence the degree of parallelism chosen for the TMUDF.
> - Make use of natural partitioning and sort order of input (child) tables to
> produce
> partitioned and sorted results.
> - Gather the necessary information based on the compile time interaction that
> is
> needed at runtime.
> - At any time in the process, the compile time interface can raise an
> exception. If
> it does, the compilation will fail and an error message provided by the
> TMUDF
> writer will be returned in the diagnostics area.
> Some more design choices:
> The C++ interface uses its own C++ namespace, to avoid naming collisions and
> to make it look more similar to Java. Objects are allocated on the system
> heap and
> are deleted after statement compilation is finished. The interface does not
> use any
> of the Trafodion objects like NAHeap, ComDiagsArea, etc. The interface does
> use
> C++ STL for strings and collection templates, again with the goal to stay
> close
> to Java.
> NAString ==> std::string
> NAHeap ==> C++ system heap
> ComDiagsArea ==> Throw exception with an attached SQLSTATE and error message
> In this first implementation, we only support a C++ interface and the
> compiler will
> call that interface directly, without going through the tdm_udrserv process.
> We may
> need a special privilege to allow a user to define code that's executed in the
> Trafodion process (the privilege to create a TMUDF that has a compile time
> interface).
> In the longer term we hope to support the following flavors:
> - C++ and Java interfaces for the TMUDF, both at compile time and run time.
> - Trusted and isolated modes, both at compile and run time.
> Example code for a "sessionize" TMUDF that expects a single table input and
> passes all columns through to the result, in addition to the session id column
> (assume that session id column is defined as the only output column in the
> DDL):
> class SessionizeUDFInterface : public TMUDRInterface
> {
> // override any methods where the UDF author would
> // like to change the default behavior
> void describeParamsAndColumns(UDRInvocationInfo &info);
> };
> void SessionizeUDFInterface::describeParamsAndColumns(
> UDRInvocationInfo &info)
> {
> // sessionize is intended to work with a single input table
> if (info.getNumTableInputs() != 1)
> throw UDRException(38001,
> "Expecting one table-valued input, got %d",
> info.getNumTableInputs());
> // add all input table columns as output columns
> info.addPassThruColumns(0);
> }
> extern "C" TMUDRInterface * SESSIONIZE_CreateCompilerInterfaceObject(
> const UDRInvocationInfo *info)
> {
> return new SessionizeUDFInterface();
> }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)