[
https://issues.apache.org/jira/browse/DRILL-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315077#comment-16315077
]
Paul Rogers commented on DRILL-6074:
------------------------------------
Also in the code:
{code}
String maskValue =
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(mask);
String stringValue =
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(input.start,
input.end, input.buffer);
{code}
The second line is overly complex since the function in the first line simply
calls the function in the second. Replace the second line with:
{code}
String stringValue =
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(input);
{code}
To clarify a point in the main description, replace this line:
{code}
@Param
NullableVarCharHolder input;
{code}
With this:
{code}
@Param
VarCharHolder input;
{code}
Also, I did a search of the code. It looks like no where is the {{constant}}
attribute of \@Param ever used. (Plus, I have seen evidence that Drill figures
out for itself which parameters are constant.) So, perhaps replace these lines:
{code}
@Param(constant = true)
VarCharHolder mask;
@Param(constant = true)
IntHolder toReplace;
{code}
With these:
{code}
@Param
VarCharHolder mask;
@Param
IntHolder toReplace;
{code}
(In any event, it is not for the function to tell the query how to use the
arguments; the code will work just fine if the above two arguments come from
query columns.)
> Corrections to UDF tutorial documentation page
> ----------------------------------------------
>
> Key: DRILL-6074
> URL: https://issues.apache.org/jira/browse/DRILL-6074
> Project: Apache Drill
> Issue Type: Bug
> Components: Documentation
> Reporter: Paul Rogers
> Assignee: Bridget Bevens
> Priority: Minor
> Labels: doc-impacting
>
> Consider the [UDF
> Tutorial|http://drill.apache.org/docs/tutorial-develop-a-simple-function/].
> Some of the details are a bit off.
> Step 3:
> bq. The function will be generated dynamically, as you can see in the
> DrillSimpleFuncHolder, and the input parameters and output holders are
> defined using holders by annotations. Define the parameters using the \@Param
> annotation.
> Better: Drill uses your function template to in-line your function code into
> Drill's own generated code. The \@Param annotation identifies the input
> arguments. The order of the annotated fields indicates the order of the
> function parameters. Each parameter field must be one of Drill's holder types.
> bq. Use a holder classes to provide a buffer to manage larger objects in an
> efficient way: VarCharHolder or NullableVarCharHolder.
> Better: Our function template tells Drill to handle nulls, so all three of
> our arguments can be declared using the VarCharHolder type.
> (Then, fix the code to use that type. The bit about larger objects is
> probably obsolete: holders are the only way to work with any value: large or
> otherwise.)
> bq. NOTE: Drill doesn’t actually use the Java heap for data being processed
> in a query but instead keeps this data off the heap and manages the
> life-cycle for us without using the Java garbage collector.
> Better: NOTE: VARCHAR data is stored in direct memory. The DrillBuf object in
> the VarCharHolder provides access to the data for the VARCHAR.
> (For context: simple types, such as INT, are stored on the heap when passed
> to a UDF, so we don't want to make a blanket statement.)
> Step 4.
> bq. Also, using the \@Output annotation, define the returned value as
> VarCharHolder type. Because you are manipulating a VarChar, you also have to
> inject a buffer that Drill uses for the output.
> Better: Identify the function's return value using the \@Output annotation.
> Like parameters, the output must be a holder type. Drill, however, does not
> provide the output buffer; we have to request one using the \@Inject
> annotation. The injected field must be of type DrillBuf. Then, in our code,
> we set the output holder to point to the injected buffer.
> Step 5. The code is inefficient and not a good example. Replace this:
> {code}
> out.end = outputValue.getBytes().length;
> buffer.setBytes(0, outputValue.getBytes());
> {code}
> With this:
> {code}
> byte result[] = outputValue.getBytes();
> out.end = result.length;
> buffer.setBytes(0, result);
> {code}
> While we are at it, we might as well make another line a bit more readable.
> {code}
> String outputValue = (new
> StringBuilder(maskSubString)).append(stringValue.substring(numberOfCharToReplace)).toString();
> {code}
> Should be rewritten as:
> {code}
> String outputValue = new StringBuilder(maskSubString)
> .append(stringValue.substring(numberOfCharToReplace)
> .toString();
> {code}
> Then in the list of steps:
> bq. Gets the number of character to replace
> The word "character" should be "characters" (plural)
> And:
> bq. Creates and populates the output buffer
> Better:
> * Copies the new string into the temporary DrillBuf
> * Sets up the output holder to point to the data in the DrillBuf
> Then:
> bq. Even to a seasoned Java developer, the eval() method might look a bit
> strange because Drill generates the final code on the fly to fulfill a query
> request. This technique leverages Java’s just-in-time (JIT) compiler for
> maximum speed.
> Better: Even to a seasoned Java developer, the eval() method might look a bit
> strange. It is best to think of the UDF declaration as a Domain-Specific
> Language (DSL) that Drill uses to describe the function. Drill uses the
> declaration to in-line your function into generated code. That is, Drill does
> not call your function code; instead Drill extracts the code and copies it
> into Drill's own generated code.
> (Note: the bit about the JIT compiler is plain wrong. Drills code generation
> has nothing to do with Java's JIT compiler.)
> Basic Coding Rules
> bq. To leverage Java’s just-in-time (JIT) compiler for maximum speed, you
> need to adhere to some basic rules.
> Better: Drill's code generation mechanism supports a restricted subset of
> Java, meaning that you must adhere to some basic rules.
> bq. Do not use imports. Instead, use the fully qualified class name as
> required by the Google Guava API packaged in Apache Drill and as shown in
> "Step 3: Declare input parameters".
> (This mixes up a couple of ideas.) Better: Do not use imports. Instead, use
> the fully qualified class name.
> bq. Manipulate the ValueHolders classes, for example VarCharHolder and
> IntHolder, as structs by calling helper methods, such as
> getStringFromVarCharHolder and toStringFromUTF8 as shown in "Step 5:
> Implement the eval() function".
> bq. Do not call methods such as toString because this causes serious problems.
> Better: Do not call any methods on the holder classes. The holders will be
> optimized away by Drill's scalar replacement mechanism.
> Some additional restrictions:
> * All class fields (member variables) must be preceded by one of the three
> annotations discussed above (\@Param, \@Output or \@Inject), or by the
> \@Workspace annotation which identifies internal temporary fields. (If you
> omit the annotations, then functions using your query will fail at runtime.)
> * Do not use static fields (such as to declare constants.) If you must
> declare constants, declare them in a class other than the UDF class.
> Prepare the Package
> bq. Because Drill generates the source, ...
> Better: Because Drill copies your code into is own generated code, ...
> Basic Coding Rules
> Build and Deploy the Function
> Test the New Function
> The above three lines probably want to be a heading; it appears as normal
> text.
> bq. Add the JAR files to Drill, by copying them to the following location:
> <Drill installation directory>/jars/3rdparty
> Perhaps add the following: Be sure to copy the jars into the above folder
> each time you rebuild, reinstall or upgrade Drill. If running in a cluster,
> copy the jars to the Drill installation on every node.
> As an alternative, you can create a site directory as described (need link.
> Do we describe this anywhere except in the Drill-on-YARN PR?) Copy your files
> into the {{$DRILL_SITE/jars}} folder. This way, you need not remember to copy
> the jars each time you reinstall Drill.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)