[jira] [Commented] (DRILL-6074) Corrections to UDF tutorial documentation page

Paul Rogers (JIRA) Sat, 06 Jan 2018 23:10:43 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315077#comment-16315077
 ]


Paul Rogers commented on DRILL-6074:
------------------------------------

Also in the code:

{code}
    String maskValue = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(mask);
    String stringValue = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(input.start,
 input.end, input.buffer);
{code}

The second line is overly complex since the function in the first line simply 
calls the function in the second. Replace the second line with:

{code}
    String stringValue = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(input);
{code}

To clarify a point in the main description, replace this line:

{code}
    @Param
    NullableVarCharHolder input;
{code}

With this:

{code}
    @Param
    VarCharHolder input;
{code}

Also, I did a search of the code. It looks like no where is the {{constant}} 
attribute of \@Param ever used. (Plus, I have seen evidence that Drill figures 
out for itself which parameters are constant.) So, perhaps replace these lines:

{code}
    @Param(constant = true)
    VarCharHolder mask;

    @Param(constant = true)
    IntHolder toReplace;
{code}

With these:

{code}
    @Param
    VarCharHolder mask;

    @Param
    IntHolder toReplace;
{code}

(In any event, it is not for the function to tell the query how to use the 
arguments; the code will work just fine if the above two arguments come from 
query columns.)

> Corrections to UDF tutorial documentation page
> ----------------------------------------------
>
>                 Key: DRILL-6074
>                 URL: https://issues.apache.org/jira/browse/DRILL-6074
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Documentation
>            Reporter: Paul Rogers
>            Assignee: Bridget Bevens
>            Priority: Minor
>              Labels: doc-impacting
>
> Consider the [UDF 
> Tutorial|http://drill.apache.org/docs/tutorial-develop-a-simple-function/]. 
> Some of the details are a bit off.
> Step 3:
> bq. The function will be generated dynamically, as you can see in the 
> DrillSimpleFuncHolder, and the input parameters and output holders are 
> defined using holders by annotations. Define the parameters using the \@Param 
> annotation.
> Better: Drill uses your function template to in-line your function code into 
> Drill's own generated code. The \@Param annotation identifies the input 
> arguments. The order of the annotated fields indicates the order of the 
> function parameters. Each parameter field must be one of Drill's holder types.
> bq. Use a holder classes to provide a buffer to manage larger objects in an 
> efficient way: VarCharHolder or NullableVarCharHolder.
> Better: Our function template tells Drill to handle nulls, so all three of 
> our arguments can be declared using the VarCharHolder type.
> (Then, fix the code to use that type. The bit about larger objects is 
> probably obsolete: holders are the only way to work with any value: large or 
> otherwise.)
> bq. NOTE: Drill doesn’t actually use the Java heap for data being processed 
> in a query but instead keeps this data off the heap and manages the 
> life-cycle for us without using the Java garbage collector.
> Better: NOTE: VARCHAR data is stored in direct memory. The DrillBuf object in 
> the VarCharHolder provides access to the data for the VARCHAR.
> (For context: simple types, such as INT, are stored on the heap when passed 
> to a UDF, so we don't want to make a blanket statement.)
> Step 4.
> bq. Also, using the \@Output annotation, define the returned value as 
> VarCharHolder type. Because you are manipulating a VarChar, you also have to 
> inject a buffer that Drill uses for the output.
> Better: Identify the function's return value using the \@Output annotation. 
> Like parameters, the output must be a holder type. Drill, however, does not 
> provide the output buffer; we have to request one using the \@Inject 
> annotation. The injected field must be of type DrillBuf. Then, in our code, 
> we set the output holder to point to the injected buffer.
> Step 5. The code is inefficient and not a good example. Replace this:
> {code}
>     out.end = outputValue.getBytes().length;
>     buffer.setBytes(0, outputValue.getBytes());
> {code}
> With this:
> {code}
>     byte result[] = outputValue.getBytes();
>     out.end = result.length;
>     buffer.setBytes(0, result);
> {code}
> While we are at it, we might as well make another line a bit more readable.
> {code}
>     String outputValue = (new 
> StringBuilder(maskSubString)).append(stringValue.substring(numberOfCharToReplace)).toString();
> {code}
> Should be rewritten as:
> {code}
>     String outputValue = new StringBuilder(maskSubString)
>         .append(stringValue.substring(numberOfCharToReplace)
>         .toString();
> {code}
> Then in the list of steps:
> bq. Gets the number of character to replace
> The word "character" should be "characters" (plural)
> And:
> bq. Creates and populates the output buffer
> Better:
> * Copies the new string into the temporary DrillBuf
> * Sets up the output holder to point to the data in the DrillBuf
> Then:
> bq. Even to a seasoned Java developer, the eval() method might look a bit 
> strange because Drill generates the final code on the fly to fulfill a query 
> request. This technique leverages Java’s just-in-time (JIT) compiler for 
> maximum speed.
> Better: Even to a seasoned Java developer, the eval() method might look a bit 
> strange. It is best to think of the UDF declaration as a Domain-Specific 
> Language (DSL) that Drill uses to describe the function. Drill uses the 
> declaration to in-line your function into generated code. That is, Drill does 
> not call your function code; instead Drill extracts the code and copies it 
> into Drill's own generated code.
> (Note: the bit about the JIT compiler is plain wrong. Drills code generation 
> has nothing to do with Java's JIT compiler.)
> Basic Coding Rules
> bq. To leverage Java’s just-in-time (JIT) compiler for maximum speed, you 
> need to adhere to some basic rules.
> Better: Drill's code generation mechanism supports a restricted subset of 
> Java, meaning that you must adhere to some basic rules.
> bq. Do not use imports. Instead, use the fully qualified class name as 
> required by the Google Guava API packaged in Apache Drill and as shown in 
> "Step 3: Declare input parameters".
> (This mixes up a couple of ideas.) Better: Do not use imports. Instead, use 
> the fully qualified class name.
> bq. Manipulate the ValueHolders classes, for example VarCharHolder and 
> IntHolder, as structs by calling helper methods, such as 
> getStringFromVarCharHolder and toStringFromUTF8 as shown in "Step 5: 
> Implement the eval() function".
> bq. Do not call methods such as toString because this causes serious problems.
> Better: Do not call any methods on the holder classes. The holders will be 
> optimized away by Drill's scalar replacement mechanism.
> Some additional restrictions:
> * All class fields (member variables) must be preceded by one of the three 
> annotations discussed above (\@Param, \@Output or \@Inject), or by the 
> \@Workspace annotation which identifies internal temporary fields. (If you 
> omit the annotations, then functions using your query will fail at runtime.)
> * Do not use static fields (such as to declare constants.) If you must 
> declare constants, declare them in a class other than the UDF class.
> Prepare the Package
> bq. Because Drill generates the source, ...
> Better: Because Drill copies your code into is own generated code, ...
> Basic Coding Rules
> Build and Deploy the Function
> Test the New Function
> The above three lines probably want to be a heading; it appears as normal 
> text.
> bq. Add the JAR files to Drill, by copying them to the following location: 
> <Drill installation directory>/jars/3rdparty
> Perhaps add the following: Be sure to copy the jars into the above folder 
> each time you rebuild, reinstall or upgrade Drill. If running in a cluster, 
> copy the jars to the Drill installation on every node.
> As an alternative, you can create a site directory as described (need link. 
> Do we describe this anywhere except in the Drill-on-YARN PR?) Copy your files 
> into the {{$DRILL_SITE/jars}} folder. This way, you need not remember to copy 
> the jars each time you reinstall Drill.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-6074) Corrections to UDF tutorial documentation page

Reply via email to