sergioferragut commented on issue #13174:
URL: https://github.com/apache/druid/issues/13174#issuecomment-1270806973

   Just tested it by using the kafka tutorial but replacing the wikipedia data 
with kttm nested data:
   Steps:
   
   Create the topic
   `./bin/kafka-topics.sh --create --topic kttm_nested --bootstrap-server 
localhost:9092`
   
   Get the nested data from kttm nested example:
   ```
   curl 
https://static.imply.io/example-data/kttm-nested-v2/kttm-nested-v2-2019-08-25.json.gz
 -o kttm-nested-data.json.gz
   gunzip -c kttm-nested-data.json.gz > kttm-nested-data.json
   ```
   
   Publish to the topic:
   ```
   export KAFKA_OPTS="-Dfile.encoding=UTF-8"
   ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic 
kttm_nested < kttm-nested-data.json
   ```
   
   The UI for "Load Data" does not automatically recognize the nested JSON 
columns in the parsing step. 
   In the "Configure Schema" step, you can use "Add dimension", type the name 
and choose type "json".
   
   The resulting Ingestion Spec:
   ```{
     "type": "kafka",
     "spec": {
       "ioConfig": {
         "type": "kafka",
         "consumerProperties": {
           "bootstrap.servers": "localhost:9092"
         },
         "topic": "kttm_nested",
         "inputFormat": {
           "type": "json"
         },
         "useEarliestOffset": true
       },
       "tuningConfig": {
         "type": "kafka"
       },
       "dataSchema": {
         "dataSource": "kttm_nested",
         "timestampSpec": {
           "column": "timestamp",
           "format": "iso"
         },
         "dimensionsSpec": {
           "dimensions": [
             "session",
             "number",
             "client_ip",
             "language",
             "adblock_list",
             "app_version",
             "path",
             "loaded_image",
             "referrer",
             "referrer_host",
             "server_ip",
             "screen",
             "window",
             {
               "type": "long",
               "name": "session_length"
             },
             "timezone",
             "timezone_offset",
             {
               "type": "json",
               "name": "event"
             },
             {
               "type": "json",
               "name": "agent"
             },
             {
               "type": "json",
               "name": "geo_ip"
             }
           ]
         },
         "granularitySpec": {
           "queryGranularity": "none",
           "rollup": false,
           "segmentGranularity": "hour"
         }
       }
     }
   }
   ```
   
   @techdocsmith, This example works, but it requires the kafka setup steps to 
run, so I'm not sure if it fits in the nested columns docs page as is. Perhaps 
adjust the kafka tutorial so it uses this source instead? Let me know how else 
to help.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to