glasser opened a new issue #7243: stringFirst/stringLast crashes at aggregation 
time
URL: https://github.com/apache/incubator-druid/issues/7243
 
 
   ### Affected Version
   
   0.13.0-incubating (and maybe older versions)
   
   ### Description
   
   See discussion on #5789 for background.
   
   Set up a fresh install of Druid 0.13-incubating following the 
[quickstart](http://druid.io/docs/latest/tutorials/index.html).
   
   Write this index spec:
   
   ```json
   {
     "type" : "index",
     "spec" : {
       "dataSchema" : {
         "dataSource" : "wikipedia",
         "parser" : {
           "type" : "string",
           "parseSpec" : {
             "format" : "json",
             "dimensionsSpec" : {},
             "timestampSpec": {
               "column": "time",
               "format": "iso"
             }
           }
         },
       "metricsSpec": [{
         "name": "channel",
         "fieldName": "channel",
         "type": "stringFirst",
         "maxStringBytes": 100
       }],
         "granularitySpec" : {
           "type" : "uniform",
           "segmentGranularity" : "day",
           "queryGranularity" : "hour",
           "intervals" : ["2015-09-12/2015-09-13"],
           "rollup" : true
         }
       },
       "ioConfig" : {
         "type" : "index",
         "firehose" : {
           "type" : "local",
           "baseDir" : "quickstart/tutorial/",
           "filter" : "wikiticker-2015-09-12-sampled.json.gz"
         },
         "appendToExisting" : false
       },
       "tuningConfig" : {
         "type" : "index",
         "targetPartitionSize" : 5000000,
         "maxRowsInMemory" : 1000,
         "forceExtendableShardSpecs" : true
       }
     }
   }
   ```
   
   and post it:
   ```
   $ bin/post-index-task --file stringfirst-index.json
   Beginning indexing data for wikipedia
   Waiting up to 119s for indexing service [http://localhost:8090/] to become 
available. [Got: <urlopen error [Errno 61] Connection refused> ]
   Task started: index_wikipedia_2019-03-12T19:07:26.507Z
   Task log:     
http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2019-03-12T19:07:26.507Z/log
   Task status:  
http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2019-03-12T19:07:26.507Z/status
   Task index_wikipedia_2019-03-12T19:07:26.507Z still running...
   Task index_wikipedia_2019-03-12T19:07:26.507Z still running...
   Task index_wikipedia_2019-03-12T19:07:26.507Z still running...
   Task index_wikipedia_2019-03-12T19:07:26.507Z still running...
   Task finished with status: FAILED
   ```
   
   The log reads:
   
   ```
   2019-03-12T19:07:45,892 WARN [appenderator_merge_0] 
org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Failed to 
push merged index for 
segment[wikipedia_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2019-03-12T19:07:26.636Z].
   java.lang.ClassCastException: 
org.apache.druid.query.aggregation.SerializablePairLongString cannot be cast to 
java.lang.String
        at 
org.apache.druid.query.aggregation.first.StringFirstAggregateCombiner.reset(StringFirstAggregateCombiner.java:35)
 ~[druid-processing-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.segment.RowCombiningTimeAndDimsIterator.resetCombinedMetrics(RowCombiningTimeAndDimsIterator.java:249)
 ~[druid-processing-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.segment.RowCombiningTimeAndDimsIterator.combineToCurrentTimeAndDims(RowCombiningTimeAndDimsIterator.java:229)
 ~[druid-processing-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.segment.RowCombiningTimeAndDimsIterator.moveToNext(RowCombiningTimeAndDimsIterator.java:191)
 ~[druid-processing-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.segment.IndexMergerV9.mergeIndexesAndWriteColumns(IndexMergerV9.java:492)
 ~[druid-processing-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:191) 
~[druid-processing-0.13.0-incubating.jar:0.13.0-incubating]
        at org.apache.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:914) 
~[druid-processing-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.segment.IndexMergerV9.mergeQueryableIndex(IndexMergerV9.java:832)
 ~[druid-processing-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.segment.IndexMergerV9.mergeQueryableIndex(IndexMergerV9.java:810)
 ~[druid-processing-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:719)
 ~[druid-server-0.13.0-incubating.jar:0.13.0-incubating]
        at 
org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$push$1(AppenderatorImpl.java:623)
 ~[druid-server-0.13.0-incubating.jar:0.13.0-incubating]
        at com.google.common.util.concurrent.Futures$1.apply(Futures.java:713) 
[guava-16.0.1.jar:?]
        at 
com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:861)
 [guava-16.0.1.jar:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_162]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_162]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
   ```
   
   @gianm suggested in #5789 that the AggregateCombiner code was just not 
running at all and that it should always be acting on 
SerializablePairLongString values rather than Strings. I am not enough of an 
expert on aggregation to know if that's correct.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to