[ https://issues.apache.org/jira/browse/KAFKA-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
whsoul updated KAFKA-9436: -------------------------- Labels: needs-kip (was: ) > New Kafka Connect SMT for plainText => Struct(or Map) > ----------------------------------------------------- > > Key: KAFKA-9436 > URL: https://issues.apache.org/jira/browse/KAFKA-9436 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect > Reporter: whsoul > Priority: Major > Labels: needs-kip > > I'd like to parse and convert plain text rows to struct(or map) data, and > load into documented database such as mongoDB, elasticSearch, etc... with SMT > > For example > > 1. String parse ( with timemillis ) > {code:java} > { > "code" : "dev_kafka_pc001_1580372261372" > ,"recode1" : "a" > ,"recode2" : "b" > }{code} > {code:java} > "transforms": "RegexTransform", > "transforms.RegexTransform.type": > "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value", > "transforms.RegexTransform.struct.field": "message", > "transforms.RegexTransform.regex": > "^(.{3,4})_(.*)_(pc|mw|ios|and)([0-9]{3})_([0-9]{13})" > "transforms.RegexTransform.mapping": > "env,serviceId,device,sequence,datetime:TIMEMILLIS"{code} > > > 2. plain text apache log > {code:java} > "111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS > /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\" > \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, > like Gecko) Chrome/75.0.3770.142 Safari/537.36\"" > {code} > SMT connect config with regular expression below can easily transform a plain > text to struct (or map) data. > > {code:java} > "transforms": "RegexTransform", > "transforms.RegexTransform.type": > "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value", > "transforms.RegexTransform.struct.field": "message", > "transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) > \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) > (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\"" > "transforms.RegexTransform.mapping": > "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent" > {code} > > I have PR about this -- This message was sent by Atlassian Jira (v8.3.4#803005)