hotdust opened a new issue #9393: if group by Numeric dimension and String 
dimension, only return 100 results
URL: https://github.com/apache/druid/issues/9393
 
 
   Hey, I'm researching Druid. I have questions about SQL.
   
   # Version
   0.17.0
   
   # Deployment
   nano-quickstart
   
   
   # Data and Schema
   1. Data is "quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz".
   2. Data is loaded into Druid by [Tutorial: Loading a 
file](https://druid.apache.org/docs/latest/tutorials/tutorial-batch.html), and 
Rollup function is turned off. Schema as following:
   ```
   {
     "type": "index_parallel",
     "ioConfig": {
       "type": "index_parallel",
       "inputSource": {
         "type": "local",
         "filter": "wikiticker-2015-09-12-sampled.json.gz",
         "baseDir": "quickstart/tutorial"
       },
       "inputFormat": {
         "type": "json"
       }
     },
     "tuningConfig": {
       "type": "index_parallel",
       "partitionsSpec": {
         "type": "dynamic"
       }
     },
     "dataSchema": {
       "dataSource": "wikiticker",
       "granularitySpec": {
         "type": "uniform",
         "queryGranularity": "NONE",
         "rollup": false,
         "segmentGranularity": "DAY"
       },
       "timestampSpec": {
         "column": "time",
         "format": "iso"
       },
       "dimensionsSpec": {
         "dimensions": [
           {
             "type": "long",
             "name": "added"
           },
           "channel",
           "cityName",
           "comment",
           "countryIsoCode",
           "countryName",
           {
             "type": "long",
             "name": "deleted"
           },
           {
             "type": "long",
             "name": "delta"
           },
           "isAnonymous",
           "isMinor",
           "isNew",
           "isRobot",
           "isUnpatrolled",
           "namespace",
           "page",
           "regionIsoCode",
           "regionName",
           "user"
         ]
       }
     }
   }
   ```
   
   
   
   # Questions
   ## Question 1: 
   Why the following SQL always return 100 results, no matter add limit clause 
or not?
   ( delta is Numeric, and user is String. Both are dimensions.)
   
   SQL:
   > SELECT delta, user FROM "wikiticker" GROUP BY delta, user
   
   Results:
   
![image](https://user-images.githubusercontent.com/18543769/75105649-e7875580-5650-11ea-9192-0fb222971285.png)
   
   
   Natvie SQL:
   ```
   {
     "queryType": "groupBy",
     "dataSource": {
       "type": "table",
       "name": "wikiticker"
     },
     "intervals": {
       "type": "intervals",
       "intervals": [
         "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
       ]
     },
     "virtualColumns": [],
     "filter": null,
     "granularity": {
       "type": "all"
     },
     "dimensions": [
       {
         "type": "default",
         "dimension": "delta",
         "outputName": "d0",
         "outputType": "LONG"
       },
       {
         "type": "default",
         "dimension": "user",
         "outputName": "d1",
         "outputType": "STRING"
       }
     ],
     "aggregations": [],
     "postAggregations": [],
     "having": null,
     "limitSpec": {
       "type": "NoopLimitSpec"
     },
     "context": {
       "sqlQueryId": "4449f325-3262-4f81-811d-acbb0600d97c"
     },
     "descending": false
   }
   ```
   
   ## Question 2
   If changes Group by dimensions orders( delta, user ->  user, delta), SQL 
will return all results.
   It looks like SQL will return all results if String dimension is first 
dimension in `Group by and Select clause`.
   How come different dimension orders will generate different results?
   > SELECT user, delta FROM "wikiticker" GROUP BY user, delta

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to