[jira] [Commented] (CALCITE-1581) UDTF like in hive

Xiaoyong Deng (JIRA) Wed, 18 Jan 2017 00:19:30 -0800

    [ 
https://issues.apache.org/jira/browse/CALCITE-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827594#comment-15827594
 ]


Xiaoyong Deng commented on CALCITE-1581:
----------------------------------------

Thanks, Julian.

My request is a stream compute scenario:

Binary stream datas with meta are generated by camera continuously, and we want 
to get the car informations from the video stream, query like this:

{code}
select stream
  udtf_analyze(binary_data, meta_data) as (car_color, car_speed, car_plate, ...)
from video_stream;
{code}

There could be no or many cars for one record(binary_data, meta_data) in, and a 
set of records out. So we call it UDTF.

We tried to use TableFunction to realize it, but found that the query is used 
like the following:

{code}
select
 *
from table("s"."GenerateStrings"(5)) as t(n, c);
{code}

And we have no idea how to use the columns in our table "video_stream".

In our design, the udtf function is the only item of "select" clause, and the 
following query is invalid.

{code}
select
  c2,
  func(c0, c1) as (f0, f1, f2)
from table_name;
{code}


{quote}
Can you give an example of an implementation of such a function? Would it be a 
Java method returning a class?
{quote}

Do you mean how to define the function? The following code could be a reference:

{code}
public abstract class UDTF {
    // in this method we specify input and output parameters: input 
ObjectInspector and an output struct
    public abstract StructObjectInspector initialize(ObjectInspector[] var1);


    // here we process an input record and write out any resulting records
    public abstract void process(Object[] var1);


    // this function is Called to notify the UDTF that there are no more rows 
to process. Clean up code or additional output can be produced here.
    public abstract void close();


    protected final void forward(Object o) {
        // collect the result
    }
}

public class ExampleUDTF extends UDTF {
    public StructObjectInspector initialize(ObjectInspector[] var1) {
        // initialize
    }

    public void process(Object[] var1) {
        // List results = do some thing;
        for(Object obj : results) {
            forward(obj);
        }
    }

    public void close() {
        // close
    }

    // other functions
}
{code}

> UDTF like in hive
> -----------------
>
>                 Key: CALCITE-1581
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1581
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Xiaoyong Deng
>            Assignee: Julian Hyde
>              Labels: udtf
>
> Support one row in and multi-column/multi-row out(one-to-many mapping), just 
> like udtf in hive.
> The query would like this:
> {code}
> select
>   func(c0, c1) as (f0, f1, f2)
> from table_name;
> {code}
> c0 and c1 are 'table_name' columns. f0, f1 and f2 are new generated columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CALCITE-1581) UDTF like in hive

Reply via email to