[GitHub] [incubator-druid] glasser commented on issue #5789: Add stringLast and stringFirst aggregators extension

GitBox Tue, 12 Mar 2019 11:35:13 -0700

glasser commented on issue #5789: Add stringLast and stringFirst aggregators 
extension
URL: https://github.com/apache/incubator-druid/pull/5789#issuecomment-472128679
 
 
   Hmm, I'm not sure if that's exactly it. I've been trying the standard 
quickstart Kafka ingestion example with this supervisor:
   
   ```
   {
     "type": "kafka",
     "dataSchema": {
       "dataSource": "wikipedia",
       "parser": {
         "type": "string",
         "parseSpec": {
           "format": "json",
           "timestampSpec": {
             "column": "time",
             "format": "auto"
           },
           "dimensionsSpec": {
             "dimensions": [
               "cityName",
               "comment",
               "countryIsoCode",
               "countryName",
               "isAnonymous",
               "isMinor",
               "isNew",
               "isRobot",
               "isUnpatrolled",
               "metroCode",
               "namespace",
               "page",
               "regionIsoCode",
               "regionName",
               "user",
               { "name": "added", "type": "long" },
               { "name": "deleted", "type": "long" },
               { "name": "delta", "type": "long" }
             ]
           }
         }
       },
       "metricsSpec": [{
         "name": "channel",
         "fieldName": "channel",
         "type": "stringFirst",
         "maxStringBytes": 100
       }],
       "granularitySpec": {
         "type": "uniform",
         "segmentGranularity": "DAY",
         "queryGranularity": "NONE",
         "rollup": false
       }
     },
     "tuningConfig": {
       "type": "kafka",
       "reportParseExceptions": false,
       "maxRowsInMemory": 3000
     },
     "ioConfig": {
       "topic": "wikipedia",
       "replicas": 2,
       "taskDuration": "PT2M",
       "completionTimeout": "PT20M",
       "consumerProperties": {
         "bootstrap.servers": "localhost:9092"
       }
     }
   }
   ```
   
   Note the maxRowsInMemory: 3000, which is less than the number of rows in the 
wikiticker-2015-09-12-sampled.json. (I tried setting it just to 1 but that 
leads to OOMs.)  This job runs successfully.
   
   I should probably try with just an index task instead of kafka to make it 
simpler though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-druid] glasser commented on issue #5789: Add stringLast and stringFirst aggregators extension

Reply via email to