Hi whsoul, Thanks for the updates. I have a few more thoughts but this is looking pretty good:
1. The name "ToStructByRegexTransform" is a little unwieldy. What do you think about something shorter like ParseStruct, ParseRegex, or ParseStructByRegex? 2. What happens if a line that the SMT sees doesn't match the regex supplied by the user? Will it throw an exception, silently skip the message and return it as-is, allow the user to configure it to do either, something else entirely? (I'd personally lean towards just throwing an exception since users can configure regexes to be lenient via the optional quantifier, i.e. "?") 3. The description for the "regex" property still includes the "( with :{TYPE} )" snippet; should that be removed? 4. Is it worth adding support to this SMT to operate on an individual field of a message? I.e., you specify a "field" of "log_line" and the SMT expects to see a struct or a map with a field/key of "log_line" and parses that instead of the entire message. If so, it might be worth specifying that this property would follow any precedents set by KIP-821 [1] so that nested fields could be accessed instead of just top-level fields. 5. What happens if the user specifies more names in the "mapping" property than there are groups in the "regex" property? 6. (Nit) I don't think the GroupRegexValidator class needs to be called out as part of the changes to public interface if it's just going to be used by this transform. [1] - https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures Cheers, Chris On Tue, Jun 7, 2022 at 4:47 AM gyejun choi <whsou...@gmail.com> wrote: > Hi Chris, > > Thank you for your positive feed back, > And sorry about late reply ( I didn’t recognize your reply email… TT ) > > And I update KIP and PR with your review > > https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex > > 1. typecast function removed > 2. struct.field removed > 3. Proposed Interface changed > > > I will wait for you second feed back > > Thanks~ > > On 2021/07/15 15:57:14 Chris Egerton wrote: > > Hi whsoul, > > > > Thanks for the KIP. The two use cases you identified seem quite > appealing, > > especially the first one: parsing structured log messages. > > > > Here are my initial thoughts on the proposed design: > > > > 1. I wonder if it's necessary to include support for type casting with > this > > SMT. We already have a Cast SMT ( > > > > https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/Cast.java > ) > > that can parse multiple fields of a structured record value with > differing > > types. Would it be enough for your new SMT to only produce string values > > for its structured data, and then allow users to perform casting logic > > using the Cast SMT afterward? > > > > 2. It seems like the "struct.field" property is similar; based on the > > examples, it looks like when the SMT is configured with a value for that > > property, it will first pull out a field from a structured record value > > (for example, it would pull out the value " > > https://kafka.apache.org/documentation/#connect" from a map of {"url": " > > https://kafka.apache.org/documentation/#connect"}), then parse that > field's > > value, and replace the entire record value (or key) with the result of > the > > parsing stage. It seems like this could be accomplished using the > > ExtractField SMT ( > > > > https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/ExtractField.java > ) > > as a preliminary step before passing it to your new SMT. Is this correct? > > And if so, could we simplify the interface for your SMT by removing the > > "struct.field" property in favor of the existing ExtractField SMT? > > > > 3. (Nit) I believe the property names in the table beneath the "Proposed > > Interfaces" section should have the "transforms.RegexTransform." prefix > > removed, since users will be able to configure the SMT with any name, not > > just "RegexTransform". This keeps in line with similar KIPs such as > > KIP-437, which added a new property to the MaskField SMT ( > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT > > ). > > > > Cheers, > > > > Chris > > > > On Thu, Jul 15, 2021 at 7:56 AM gyejun choi <wh...@gmail.com> wrote: > > > > > is there someone who any feed back about this? > > > > > > 2020년 10월 23일 (금) 오후 2:56, gyejun choi <wh...@gmail.com>님이 작성: > > > > > > > Hi, > > > > > > > > I've opened KIP-678 which is intended to provide a new SMT in Kafka > > > Connect. > > > > I'd be grateful for any > > > > feedback: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex== > > > > > > > > thanks, > > > > > > > > whsoul > > > > > > > > > > > > > >