whsoul opened a new pull request, #12219:
URL: https://github.com/apache/kafka/pull/12219

   Re branching and PR (about #7965) with reviewed fix from Chris Egerton 
(https://lists.apache.org/thread/xb57l7j953k8dfgqvktb09y31vzpm1xx)
   
   > 1. I wonder if it's necessary to include support for type casting with this
   > SMT. We already have a Cast SMT (
   > 
https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/Cast.java)
   > that can parse multiple fields of a structured record value with differing
   > types. Would it be enough for your new SMT to only produce string values
   > for its structured data, and then allow users to perform casting logic
   > using the Cast SMT afterward?
   
   > 2. It seems like the "struct.field" property is similar; based on the
   > examples, it looks like when the SMT is configured with a value for that
   > property, it will first pull out a field from a structured record value
   > (for example, it would pull out the value "
   > https://kafka.apache.org/documentation/#connect"; from a map of {"url": "
   > https://kafka.apache.org/documentation/#connect"}), then parse that field's
   > value, and replace the entire record value (or key) with the result of the
   > parsing stage. It seems like this could be accomplished using the
   > ExtractField SMT (
   > 
https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/ExtractField.java)
   > as a preliminary step before passing it to your new SMT. Is this correct?
   > And if so, could we simplify the interface for your SMT by removing the
   > "struct.field" property in favor of the existing ExtractField SMT?
   
   1. CAST function removed ( use combination with Cast SMT (
   2. struct.field option removed ( use combination with EXtractField SMT )
   
   
   
   New SMT
   
   plain text => struct(map)
   regex group condition with ordered key name
   compatible with single plain text input and struct field input plain text
   
   
   ### sample1
   ~~~
   "111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS 
/api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\"; 
\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, 
like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
   SMT connect config with regular expression below can easily transform a 
plain text to struct (or map) data.
   "transforms": "TimestampTopic, RegexTransform",
   "transforms.RegexTransform.type": 
"org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
   ~~~
   
   ~~~
   "transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) 
\\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) 
(.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""
   
   "transforms.RegexTransform.mapping": 
"IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms,Referrer,UserAgent"
   ~~~
   
   ### sample2
   
   ~~~
   dev_kafka_pc001_1580372261372"
   ~~~
   
   ~~~
   "transforms": "RegexTransform",
   "transforms.RegexTransform.type": 
"org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
   
   "transforms.RegexTransform.regex": 
"^(.{3,4})_(.*)_(pc|mw|ios|and)([0-9]{3})_([0-9]{13})" 
"transforms.RegexTransform.mapping": "env,serviceId,device,sequence,datetime"
   ~~~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to