sf-mk opened a new issue #5982: "buckets" post aggregation of "approxHistogram" 
returns exception on Nan histograms
URL: https://github.com/apache/incubator-druid/issues/5982
 
 
   Following documentation in:
   
http://druid.io/docs/latest/development/extensions-core/approximate-histograms.html
   
   I make a request where approxHistogramFold is followed by a buckets post 
aggregation.  In instances where one of the bins that the approxHistogramFold 
aggregation returns is a series of NaN values then the buckets post aggregation 
will return one of two unknown exceptions: 
"java.lang.ArrayIndexOutOfBoundsException" or 
"com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input: 
expected close marker for ARRAY"
   
   This only happens when querying time ranges near the beginning of the start 
of data.  I believe the NaN return values from the aggregation stage are caused 
by attempting to aggregate a bucket which something believes has data, but 
actually does not.
   
   Examples of requests and responses:
   
   Request1 (no post aggregation):
   ```
   {
     "queryType": "timeseries",
     "dataSource": "DS",
     "granularity": "minute",
     "aggregations": [
       {
         "fieldName": "histogram",
         "name": "h",
         "type": "approxHistogramFold"
       }
     ],
     "postAggregations": [],
     "intervals": [
       "2018-07-09T14:26:00.000Z/2018-07-09T15:25:00.00Z"
     ],
     "filter": {
       "type": "and",
       "fields": [
         {
           "dimension": "field1",
           "type": "selector",
           "value": "a"
         }
       ]
     }
   }
   
   ```
   
   Response1:
   ```
   [
     {
       "timestamp": "2018-07-09T15:24:00.000Z", 
       "result": {
         "h": {
           "breaks": [
             "Infinity", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "-Infinity"
           ], 
           "counts": [
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN", 
             "NaN"
           ]
         }
       }
     }
   ]
   
   ```
   
   Request2:  Identical to Request1, but with:
   ```
     "postAggregations": [
       {
         "type": "buckets",
         "name": "r",
         "fieldName": "h",
         "bucketSize": "10000000"
       }
     ],
   ```
   
   Response2:
   ```
   {
     "error": "Unknown exception",
     "errorMessage": null,
     "errorClass": "java.lang.ArrayIndexOutOfBoundsException",
     "host": "druid-server:8100",
     "message": "Unknown exception"
   }
   ```
   OR (the exception seems to change if the query is made much later)
   ```
   {
     "errorClass": "java.lang.RuntimeException", 
     "host": null, {
     "errorClass": "java.lang.RuntimeException", 
     "host": null, 
     "errorMessage": "com.fasterxml.jackson.core.JsonParseException: Unexpected 
end-of-input: expected close marker for ARRAY (from [Source: 
java.io.SequenceInputStream@2e1f6a8; line: -1, column: -1])\n at [Source: 
java.io.SequenceInputStream@2e1f6a8; line: -1, column: 58]", 
     "error": "Unknown exception"
   }
   
     "errorMessage": "com.fasterxml.jackson.core.JsonParseException: Unexpected 
end-of-input: expected close marker for ARRAY (from [Source: 
java.io.SequenceInputStream@2e1f6a8; line: -1, column: -1])\n at [Source: 
java.io.SequenceInputStream@2e1f6a8; line: -1, column: 58]", 
     "error": "Unknown exception"
   }
   ```
   
   Interestingly the equalBuckets post aggregator does not seem to suffer from 
this problem, it simply returns a histogram of NaNs similar to the original.
   My guess is that the toHistogram(bucketSize, offset) function attempts to do 
it's calculations under the assumption that the histogram has sensible values 
in it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to