[ 
https://issues.apache.org/jira/browse/KAFKA-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

whsoul updated KAFKA-9436:
--------------------------
    Labels: needs-kip  (was: )

> New Kafka Connect SMT for plainText => Struct(or Map)
> -----------------------------------------------------
>
>                 Key: KAFKA-9436
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9436
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: whsoul
>            Priority: Major
>              Labels: needs-kip
>
> I'd like to parse and convert plain text rows to struct(or map) data, and 
> load into documented database such as mongoDB, elasticSearch, etc... with SMT
>  
> For example
>  
> 1. String parse ( with timemillis )
> {code:java}
> {
>    "code" : "dev_kafka_pc001_1580372261372"
>    ,"recode1" : "a"
>    ,"recode2" : "b" 
> }{code}
> {code:java}
> "transforms": "RegexTransform",
> "transforms.RegexTransform.type": 
> "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
> "transforms.RegexTransform.struct.field": "message",
> "transforms.RegexTransform.regex": 
> "^(.{3,4})_(.*)_(pc|mw|ios|and)([0-9]{3})_([0-9]{13})" 
> "transforms.RegexTransform.mapping": 
> "env,serviceId,device,sequence,datetime:TIMEMILLIS"{code}
>  
>  
> 2. plain text apache log
> {code:java}
> "111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS 
> /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\"; 
> \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, 
> like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
> {code}
> SMT connect config with regular expression below can easily transform a plain 
> text to struct (or map) data.
>  
> {code:java}
> "transforms": "RegexTransform",
> "transforms.RegexTransform.type": 
> "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
> "transforms.RegexTransform.struct.field": "message",
> "transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) 
> \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) 
> (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""
> "transforms.RegexTransform.mapping": 
> "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
> {code}
>  
> I have PR about this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to