sergioferragut commented on issue #13174: URL: https://github.com/apache/druid/issues/13174#issuecomment-1270806973
Just tested it by using the kafka tutorial but replacing the wikipedia data with kttm nested data: Steps: Create the topic `./bin/kafka-topics.sh --create --topic kttm_nested --bootstrap-server localhost:9092` Get the nested data from kttm nested example: ``` curl https://static.imply.io/example-data/kttm-nested-v2/kttm-nested-v2-2019-08-25.json.gz -o kttm-nested-data.json.gz gunzip -c kttm-nested-data.json.gz > kttm-nested-data.json ``` Publish to the topic: ``` export KAFKA_OPTS="-Dfile.encoding=UTF-8" ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kttm_nested < kttm-nested-data.json ``` The UI for "Load Data" does not automatically recognize the nested JSON columns in the parsing step. In the "Configure Schema" step, you can use "Add dimension", type the name and choose type "json". The resulting Ingestion Spec: ```{ "type": "kafka", "spec": { "ioConfig": { "type": "kafka", "consumerProperties": { "bootstrap.servers": "localhost:9092" }, "topic": "kttm_nested", "inputFormat": { "type": "json" }, "useEarliestOffset": true }, "tuningConfig": { "type": "kafka" }, "dataSchema": { "dataSource": "kttm_nested", "timestampSpec": { "column": "timestamp", "format": "iso" }, "dimensionsSpec": { "dimensions": [ "session", "number", "client_ip", "language", "adblock_list", "app_version", "path", "loaded_image", "referrer", "referrer_host", "server_ip", "screen", "window", { "type": "long", "name": "session_length" }, "timezone", "timezone_offset", { "type": "json", "name": "event" }, { "type": "json", "name": "agent" }, { "type": "json", "name": "geo_ip" } ] }, "granularitySpec": { "queryGranularity": "none", "rollup": false, "segmentGranularity": "hour" } } } } ``` @techdocsmith, This example works, but it requires the kafka setup steps to run, so I'm not sure if it fits in the nested columns docs page as is. Perhaps adjust the kafka tutorial so it uses this source instead? Let me know how else to help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
