kkonstantine commented on a change in pull request #8839: URL: https://github.com/apache/kafka/pull/8839#discussion_r439227501
########## File path: docs/connect.html ########## @@ -180,13 +182,80 @@ <h4><a id="connect_transforms" href="#connect_transforms">Transformations</a></h <li>SetSchemaMetadata - modify the schema name or version</li> <li>TimestampRouter - Modify the topic of a record based on original topic and timestamp. Useful when using a sink that needs to write to different tables or indexes based on timestamps</li> <li>RegexRouter - modify the topic of a record based on original topic, replacement string and a regular expression</li> + <li>Filter - Removes messages from all further processing. This is used with a <a href="#connect_predicates">predicate</a> to selectively filter certain messages.</li> </ul> <p>Details on how to configure each transformation are listed below:</p> <!--#include virtual="generated/connect_transforms.html" --> + + <h5><a id="connect_predicates" href="#connect_predicates">Predicates</a></h5> + + <p>Transformations can be configured with prediates so that the transformation is applied only to messages which satisfy some condition. In particular, when combined with the <b>Filter</b> transformation predicates can be used to selectively filter out certain messages.</p> Review comment: ```suggestion <p>Transformations can be configured with predicates so that the transformation is applied only to messages which satisfy some condition. In particular, when combined with the <b>Filter</b> transformation predicates can be used to selectively filter out certain messages.</p> ``` ########## File path: docs/connect.html ########## @@ -180,13 +182,80 @@ <h4><a id="connect_transforms" href="#connect_transforms">Transformations</a></h <li>SetSchemaMetadata - modify the schema name or version</li> <li>TimestampRouter - Modify the topic of a record based on original topic and timestamp. Useful when using a sink that needs to write to different tables or indexes based on timestamps</li> <li>RegexRouter - modify the topic of a record based on original topic, replacement string and a regular expression</li> + <li>Filter - Removes messages from all further processing. This is used with a <a href="#connect_predicates">predicate</a> to selectively filter certain messages.</li> </ul> <p>Details on how to configure each transformation are listed below:</p> <!--#include virtual="generated/connect_transforms.html" --> + + <h5><a id="connect_predicates" href="#connect_predicates">Predicates</a></h5> + + <p>Transformations can be configured with prediates so that the transformation is applied only to messages which satisfy some condition. In particular, when combined with the <b>Filter</b> transformation predicates can be used to selectively filter out certain messages.</p> + + <p>Predicates are specified in the connector configuration.</p> + + <ul> + <li><code>predicates</code> - Set of aliases for the predicates to be applied to some of the transformations.</li> + <li><code>predicates.$alias.type</code> - Fully qualified class name for the predicate.</li> + <li><code>predicates.$alias.$predicateSpecificConfig</code> - Configuration properties for the predicate.</li> + </ul> + + <p>All transformations have the implicit config properties <code>predicate</code> and <code>negate</code>. A predicular predicate is associated with a transformation by setting the transformation's <code>predicate</code> config to the predicate's alias. The predicate's value can be reversed using the <code>negate</code> configuration property.</p> + + <p>For example, suppose you have a source connector which produces messages to many different topics and you want to:</p> + <ul> + <li>filter out the messages in the 'foo' topic entirely</li> + <li>apply the ExtractField transformation with the field name 'other_field' to records in all topics <i>except</i> the topic 'bar'</li> + </ul> + + <p>To do this we need to first to filter out the records destined for the topic 'foo'. The Filter transformation removes records from further processing, and can use the TopicNameMatches predicate to apply the transformation only to records in topics which match a certain regular expression. TopicNameMatches's only configuration property is <code>pattern</code> which is a Java regular expression for matching against the topic name. The configuration would look like this:</p> + + <pre class="brush: text;"> + transforms=Filter + transforms.Filter.type=org.apache.kafka.connect.transforms.Filter + transforms.Filter.predicate=IsFoo + + predicates=IsFoo + predicates.IsFoo.type=org.apache.kafka.connect.predicates.TopicNameMatches + predicates.IsFoo.pattern=foo + </pre> + + <p>Next we need to apply ExtractField only when the topic name of the record is not 'bar'. We can't just use TopicNameMatches directly, because that would apply the transformation to matching topic names, not topic names which do <i>not</i> match. The transformation's implicit <code>negate</code> config properties allows us to invert the set of records which a predicate matches. Adding the configuration for this to the previous example we arrive at:</p> + + <pre class="brush: text;"> + transforms=Filter,Extract + transforms.Filter.type=org.apache.kafka.connect.transforms.Filter + transforms.Filter.predicate=IsFoo + + transforms.Extract.type=org.apache.kafka.connect.transforms.ExtractField$Key + transforms.Extract.field=other_field + transforms.Extract.predicate=IsBar + transforms.Extract.negate=true + + predicates=IsFoo,IsBar + predicates.IsFoo.type=org.apache.kafka.connect.predicates.TopicNameMatches + predicates.IsFoo.pattern=foo + + predicates.IsBar.type=org.apache.kafka.connect.predicates.TopicNameMatches + predicates.IsBar.pattern=bar + </pre> + + <p>Kafka Connect includes the following predicates:</p> + + <ul> + <li><code>TopicNameMatches</code> - matches records in a topic with a name matching a particular Java regular expression.</li> + <li><code>HasHeaderKey</code> - matches records which have a header with the given key.</li> + <li><code>RecordIsTombstone</code> - matches tombstone records, that is, those will a null value.</li> Review comment: ```suggestion <li><code>RecordIsTombstone</code> - matches tombstone records, that is records with a null value.</li> ``` ########## File path: docs/connect.html ########## @@ -180,13 +182,80 @@ <h4><a id="connect_transforms" href="#connect_transforms">Transformations</a></h <li>SetSchemaMetadata - modify the schema name or version</li> <li>TimestampRouter - Modify the topic of a record based on original topic and timestamp. Useful when using a sink that needs to write to different tables or indexes based on timestamps</li> <li>RegexRouter - modify the topic of a record based on original topic, replacement string and a regular expression</li> + <li>Filter - Removes messages from all further processing. This is used with a <a href="#connect_predicates">predicate</a> to selectively filter certain messages.</li> </ul> <p>Details on how to configure each transformation are listed below:</p> <!--#include virtual="generated/connect_transforms.html" --> + + <h5><a id="connect_predicates" href="#connect_predicates">Predicates</a></h5> + + <p>Transformations can be configured with prediates so that the transformation is applied only to messages which satisfy some condition. In particular, when combined with the <b>Filter</b> transformation predicates can be used to selectively filter out certain messages.</p> + + <p>Predicates are specified in the connector configuration.</p> + + <ul> + <li><code>predicates</code> - Set of aliases for the predicates to be applied to some of the transformations.</li> + <li><code>predicates.$alias.type</code> - Fully qualified class name for the predicate.</li> + <li><code>predicates.$alias.$predicateSpecificConfig</code> - Configuration properties for the predicate.</li> + </ul> + + <p>All transformations have the implicit config properties <code>predicate</code> and <code>negate</code>. A predicular predicate is associated with a transformation by setting the transformation's <code>predicate</code> config to the predicate's alias. The predicate's value can be reversed using the <code>negate</code> configuration property.</p> + + <p>For example, suppose you have a source connector which produces messages to many different topics and you want to:</p> + <ul> + <li>filter out the messages in the 'foo' topic entirely</li> + <li>apply the ExtractField transformation with the field name 'other_field' to records in all topics <i>except</i> the topic 'bar'</li> + </ul> + + <p>To do this we need to first to filter out the records destined for the topic 'foo'. The Filter transformation removes records from further processing, and can use the TopicNameMatches predicate to apply the transformation only to records in topics which match a certain regular expression. TopicNameMatches's only configuration property is <code>pattern</code> which is a Java regular expression for matching against the topic name. The configuration would look like this:</p> Review comment: ```suggestion <p>To do this we need first to filter out the records destined for the topic 'foo'. The Filter transformation removes records from further processing, and can use the TopicNameMatches predicate to apply the transformation only to records in topics which match a certain regular expression. TopicNameMatches's only configuration property is <code>pattern</code> which is a Java regular expression for matching against the topic name. The configuration would look like this:</p> ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org