[
https://issues.apache.org/jira/browse/STORM-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108824#comment-15108824
]
ASF GitHub Bot commented on STORM-1214:
---------------------------------------
Github user revans2 commented on a diff in the pull request:
https://github.com/apache/storm/pull/1029#discussion_r50276631
--- Diff:
storm-core/src/jvm/org/apache/storm/trident/operation/Function.java ---
@@ -19,6 +19,73 @@
import org.apache.storm.trident.tuple.TridentTuple;
+import java.util.Map;
+
+/**
+ * A function takes in a set of input fields and emits zero or more tuples
as output. The fields of the output tuple
+ * are appended to the original input tuple in the stream. If a function
emits no tuples, the original input tuple is
+ * filtered out. Otherwise, the input tuple is duplicated for each output
tuple.
+ *
+ * For example, if you have the following function:
+ *
+ * ```java
+ * public class MyFunction extends BaseFunction {
+ * public void execute(TridentTuple tuple, TridentCollector
collector) {
+ * for(int i=0; i < tuple.getInteger(0); i++) {
+ * collector.emit(new Values(i));
+ * }
+ * }
+ * }
+ *
+ * ```
+ *
+ * Now suppose you have a stream in the variable `mystream` with the
fields `["a", "b", "c"]` with the following tuples:
+ *
+ * ```
+ * [1, 2, 3]
+ * [4, 1, 6]
+ * [3, 0, 8]
+ * ```
+ * If you had the following code in your topology definition:
+ *
+ * ```java
+ * mystream.each(new Fields("b"), new MyFunction(), new Fields("d")))
+ * ```
+ *
+ * The resulting tuples would have the fields `["a", "b", "c", "d"]` and
look like this:
+ *
+ * ```
+ * [1, 2, 3, 0]
+ * [1, 2, 3, 1]
+ * [4, 1, 6, 0]
+ * ```
+ *
+ * In this case, the parameter `new Fields("b")` tells Trident that you
would like to select the field "b" as input
+ * to the function, and that will be the only field in the Tuple passed to
the `execute()` method. The value of "b" in
+ * the first tuple (2) causes the for loop to execute twice, so 2 tuples
are emitted. similarly the second tuple causes
+ * one tuple to be emitted. For the third tuple, the value of 0 causes the
`for` loop to be skipped, so nothing is
+ * emitted and the incoming tuple is filtered out of the stream.
+ *
+ * ### Configuration
+ * If your `Function` implementation has configuration requirements, you
will typically want to extend
+ * {@link storm.trident.operation.BaseFunction} and override the
+ * {@link storm.trident.operation.Operation#prepare(Map,
TridentOperationContext)} method to perform your custom
+ * initialization.
+ *
+ * ### Performance Considerations
+ * Because Trident Functions perform logic on individual tuples -- as
opposed to batches -- it is advisable
+ * to avoid expensive operations such as database operations in a
Function, if possible. For data store interactions
+ * it is better to use a {@link storm.trident.state.State} or {@link
storm.trident.state.QueryFunction} implementation
+ * since Trident states operate on batch partitions and can perform bulk
updates to a database.
+ *
+ *
+ */
--- End diff --
org.apache in the links here too.
> Trident API Improvements
> ------------------------
>
> Key: STORM-1214
> URL: https://issues.apache.org/jira/browse/STORM-1214
> Project: Apache Storm
> Issue Type: Bug
> Reporter: P. Taylor Goetz
> Assignee: P. Taylor Goetz
>
> There are a few idiosyncrasies in the Trident API that can sometimes trip
> developers up (e.g. when and how to set the parallelism of components). There
> are also a few areas where the API could be made slightly more intuitive
> (e.g. add Java 8 streams-like methods like {{filter()}}, {{map()}},
> {{flatMap()}}, etc.).
> Some of these concerns can be addressed through documentation, and some by
> altering the API. Since we are approaching a 1.0 release, it would be good to
> address any API changes before a major release.
> The goal of this JIRA is to identify specific areas of improvement and
> formulate an implementation that addresses them.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)