Hello Chris,

Long time no feed back about this,

do I have any work left to do to proceed with discuss and vote?

I will waiting for your guide,

thanks, always

whsoul

2022년 6월 14일 (화) 오후 9:29, gyejun choi <whsou...@gmail.com>님이 작성:

> Hi Chris
>
> I updated KIP document
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex
> )
> about discussion and changes before,
>
> and commited some code according to your review.
>
> https://github.com/apache/kafka/pull/12219/commits/1541a538eaff676441f125c400a7807d17f2e138
>
> I understand you mean the simplicity is more important with SMT ( because
> it is Simple Message Trasform.. )
> so I removed about "message" structured format input support
>
> and also move GroupRegexValidator to inner private class.
>
> thanks, always
>
> whsoul
>
> 2022년 6월 14일 (화) 오전 11:30, Chris Egerton <fearthecel...@gmail.com>님이 작성:
>
>> Hi whsoul,
>>
>> Would you mind updating the KIP document (
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex
>> )
>> with all of these changes? We tend to discuss and vote on what's included
>> in the Wiki as our source of truth, as opposed to pull requests.
>>
>> RE 4:
>> > it seems the chained transform with extractField SMT + plainText value
>> ParseStructByRegex make the same result with struct value
>> ParseSturctByRegex, but it will drop collector meta data during
>> extractField ( I think.. )
>> This is exactly right--using ExtractField + ParseStructByRegex will
>> replace
>> the record key/value with the parsed struct, and drop everything else. Do
>> you think it'd increase the implementation complexity significantly to add
>> support for users to specify a field to operate on, which would preserve
>> the rest of the record key/value as-is? I'm not so sure that everyone is
>> going to want to drop all other metadata from their messages, but if
>> there's something that makes this particularly difficult to implement (as
>> compared to the other SMTs that are already included out of the box with
>> Kafka Connect), then we could probably leave it out for now.
>>
>> RE 6:
>> It's a tricky distinction, but what I mean is that, although we might add
>> a
>> class named GroupRegexValidator and use that class in our SMT library,
>> unless it's part of the public interface we're trying to change, it's an
>> implementation detail and we don't have to call it out in the KIP. The
>> advantage to leaving it out is that it makes the KIP more concise and
>> therefore easier to review, you can choose to rename, remove, decompose,
>> etc. the class in your PR without having to worry about sticking to the
>> plan in the KIP that everyone reviewed and voted on. It's a minor detail
>> though, I'm noting it here more because it may be useful when writing
>> future KIPs than because it's necessary to adhere to strictly in this one.
>>
>> Cheers,
>>
>> Chris
>>
>> On Fri, Jun 10, 2022 at 8:13 AM gyejun choi <whsou...@gmail.com> wrote:
>>
>> > Hi Chris,
>> >
>> > I applied some code fix according your second reviews.
>> >
>> >
>> https://github.com/apache/kafka/pull/12219/commits/f673ea2eae0d907502e44c0ecd53b616386627bf
>> >
>> >
>> > 1. [applied] changed name as ParseStructByRegex
>> >
>> > 2. [applied] throw DataException, when a line that the SMT sees doesn't
>> > match the regex...
>> > originally, it will be skipped if no data match with regex,
>> > but change code to throw DataException according to your review
>> >
>> > 3. [already applied]
>> > I already delete code about desc ":{TYPE}" with commit below
>> >
>> >
>> https://github.com/apache/kafka/pull/12219/commits/534d995b3e6371c37443eb72eee03884cb23c85d
>> >
>> > 4. [need discuss]
>> > In my use case about log data collection,
>> > I configured the pipeline below
>> > nginx => filebeat => kafka => kafka connect es connector => es
>> >
>> > filebeat ( or most other log collector ) usually send value as struct (
>> not
>> > plaintext ) with collector meta data,
>> > and the key name as "message" ( in case filebeat )
>> >
>> > I think there are more use cases log message wrapped struct value than
>> > plain text value,
>> > and it seems the chained transform with extractField SMT + plainText
>> value
>> > ParseStructByRegex make the same result with struct value
>> > ParseSturctByRegex,
>> > but it will drop collector meta data during extractField ( I think.. )
>> > and also in almost case, users will use ParseStructByRegex SMT with
>> > extractField SMT
>> >
>> > 5. [applied] throw DataException, when there are difference between
>> group
>> > size and mapping names size
>> >
>> >
>> > 6. [ question ]
>> > you mean, use RegexValidator class already exist?
>> > without group "(.*)" pattern check, it will not provide early detection
>> > about regex config mistake,
>> > If you think it is enough as runtime DataException detection?
>> >
>> > always thanks.
>> >
>> > whsoul
>> >
>> > 2022년 6월 8일 (수) 오전 9:46, Chris Egerton <fearthecel...@gmail.com>님이 작성:
>> >
>> > > Hi whsoul,
>> > >
>> > > Thanks for the updates. I have a few more thoughts but this is looking
>> > > pretty good:
>> > >
>> > > 1. The name "ToStructByRegexTransform" is a little unwieldy. What do
>> you
>> > > think about something shorter like ParseStruct, ParseRegex, or
>> > > ParseStructByRegex?
>> > >
>> > > 2. What happens if a line that the SMT sees doesn't match the regex
>> > > supplied by the user? Will it throw an exception, silently skip the
>> > message
>> > > and return it as-is, allow the user to configure it to do either,
>> > something
>> > > else entirely? (I'd personally lean towards just throwing an exception
>> > > since users can configure regexes to be lenient via the optional
>> > > quantifier, i.e. "?")
>> > >
>> > > 3. The description for the "regex" property still includes the "( with
>> > > :{TYPE} )" snippet; should that be removed?
>> > >
>> > > 4. Is it worth adding support to this SMT to operate on an individual
>> > field
>> > > of a message? I.e., you specify a "field" of "log_line" and the SMT
>> > expects
>> > > to see a struct or a map with a field/key of "log_line" and parses
>> that
>> > > instead of the entire message. If so, it might be worth specifying
>> that
>> > > this property would follow any precedents set by KIP-821 [1] so that
>> > nested
>> > > fields could be accessed instead of just top-level fields.
>> > >
>> > > 5. What happens if the user specifies more names in the "mapping"
>> > property
>> > > than there are groups in the "regex" property?
>> > >
>> > > 6. (Nit) I don't think the GroupRegexValidator class needs to be
>> called
>> > out
>> > > as part of the changes to public interface if it's just going to be
>> used
>> > by
>> > > this transform.
>> > >
>> > > [1] -
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
>> > >
>> > > Cheers,
>> > >
>> > > Chris
>> > >
>> > > On Tue, Jun 7, 2022 at 4:47 AM gyejun choi <whsou...@gmail.com>
>> wrote:
>> > >
>> > > > Hi Chris,
>> > > >
>> > > > Thank you for your positive feed back,
>> > > > And sorry about late reply ( I didn’t recognize your reply email…
>> TT )
>> > > >
>> > > > And I update KIP and PR with your review
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex
>> > > >
>> > > > 1. typecast function removed
>> > > > 2. struct.field removed
>> > > > 3. Proposed Interface changed
>> > > >
>> > > >
>> > > > I will wait for you second feed back
>> > > >
>> > > > Thanks~
>> > > >
>> > > > On 2021/07/15 15:57:14 Chris Egerton wrote:
>> > > > > Hi whsoul,
>> > > > >
>> > > > > Thanks for the KIP. The two use cases you identified seem quite
>> > > > appealing,
>> > > > > especially the first one: parsing structured log messages.
>> > > > >
>> > > > > Here are my initial thoughts on the proposed design:
>> > > > >
>> > > > > 1. I wonder if it's necessary to include support for type casting
>> > with
>> > > > this
>> > > > > SMT. We already have a Cast SMT (
>> > > > >
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/Cast.java
>> > > > )
>> > > > > that can parse multiple fields of a structured record value with
>> > > > differing
>> > > > > types. Would it be enough for your new SMT to only produce string
>> > > values
>> > > > > for its structured data, and then allow users to perform casting
>> > logic
>> > > > > using the Cast SMT afterward?
>> > > > >
>> > > > > 2. It seems like the "struct.field" property is similar; based on
>> the
>> > > > > examples, it looks like when the SMT is configured with a value
>> for
>> > > that
>> > > > > property, it will first pull out a field from a structured record
>> > value
>> > > > > (for example, it would pull out the value "
>> > > > > https://kafka.apache.org/documentation/#connect"; from a map of
>> > > {"url": "
>> > > > > https://kafka.apache.org/documentation/#connect"}), then parse
>> that
>> > > > field's
>> > > > > value, and replace the entire record value (or key) with the
>> result
>> > of
>> > > > the
>> > > > > parsing stage. It seems like this could be accomplished using the
>> > > > > ExtractField SMT (
>> > > > >
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/ExtractField.java
>> > > > )
>> > > > > as a preliminary step before passing it to your new SMT. Is this
>> > > correct?
>> > > > > And if so, could we simplify the interface for your SMT by
>> removing
>> > the
>> > > > > "struct.field" property in favor of the existing ExtractField SMT?
>> > > > >
>> > > > > 3. (Nit) I believe the property names in the table beneath the
>> > > "Proposed
>> > > > > Interfaces" section should have the "transforms.RegexTransform."
>> > prefix
>> > > > > removed, since users will be able to configure the SMT with any
>> name,
>> > > not
>> > > > > just "RegexTransform". This keeps in line with similar KIPs such
>> as
>> > > > > KIP-437, which added a new property to the MaskField SMT (
>> > > > >
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-437%3A+Custom+replacement+for+MaskField+SMT
>> > > > > ).
>> > > > >
>> > > > > Cheers,
>> > > > >
>> > > > > Chris
>> > > > >
>> > > > > On Thu, Jul 15, 2021 at 7:56 AM gyejun choi <wh...@gmail.com>
>> wrote:
>> > > > >
>> > > > > > is there someone who any feed back about this?
>> > > > > >
>> > > > > > 2020년 10월 23일 (금) 오후 2:56, gyejun choi <wh...@gmail.com>님이 작성:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > I've opened KIP-678 which is intended to provide a new SMT in
>> > Kafka
>> > > > > > Connect.
>> > > > > > > I'd be grateful for any
>> > > > > > > feedback:
>> > > > > >
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP+678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex==
>> > > > > > >
>> > > > > > > thanks,
>> > > > > > >
>> > > > > > > whsoul
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to