cjameyson opened a new issue #9951:
URL: https://github.com/apache/incubator-superset/issues/9951
`When applying a Filter to a count distinct metric (`COUNT(DISTINCT
mycolumn)`), the HAVING clause generated in the Custom SQL tab of the Filter
popover is invalid syntax for Impala.
The generated query is:
```
SELECT category AS category,
count(DISTINCT id) AS `COUNT_DISTINCT(id)`
FROM mytable
WHERE to_timestamp(cast(created_dts as string), 'yyyyMMddHH') >= '2020-05-22
00:00:00.000000'
AND to_timestamp(cast(created_dts as string), 'yyyyMMddHH') < '2020-05-29
00:00:00.000000'
GROUP BY category
HAVING ((COUNT_DISTINCT(id) > 0))
ORDER BY `COUNT_DISTINCT(id)` DESC
```
And the query fails on the HAVING clause because `COUNT_DISTINCT` is not
valid for Impala.
### Expected results
`HAVING COUNT(distinct id) > 0`
or
```HAVING `COUNT_DISTINCT(id)` > 0```
### Actual results
`HAVING ((COUNT_DISTINCT(id) > 0))`
Which causes this Impala exception:
```
impala error: IllegalStateException: Failed analysis after expr
substitution. CAUSED BY: AnalysisException: mydatabase.count_distinct() unknown
```
#### Screenshots
[Screenshot](https://imgur.com/a/vnzEhPa)
#### How to reproduce the bug
1. Create a new chart using the Table viz
2. Group by a column
3. Add a metric for COUNT DISTINCT on some other column
4. Click the Filter dropdown, add filter for the metric added in step 3
5. Note the invalid SQL autopopulated in the Custom SQL tab
6. Hit Run Query Button
### Environment
```
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
```
## Version Info
- superset version: `superset version`
```
$ superset version
Loaded your LOCAL configuration at
[/home/superset/ssconfig/superset_config.py]
logging was configured successfully
INFO:superset.utils.logging_configurator:logging was configured successfully
/home/superset/super/lib64/python3.6/site-packages/flask_caching/__init__.py:189:
UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively
disabled.
"Flask-Caching: CACHE_TYPE is set to null, "
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Superset None
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
```
Checking pip...
```
apache-superset (0.36.0)
INSTALLED: 0.36.0 (latest)
```
```
superset --version
Python 3.6.8
Flask 1.1.2
Werkzeug 1.0.1
```
- node.js version: `node -v`
(not installed)
- npm version: `npm -v`
(not installed)
- Impyla
```
>>> import impala
>>> impala.__version__
'v0.16.2'
```
### Checklist
Make sure these boxes are checked before submitting your issue - thank you!
- [x] I have checked the superset logs for python stacktraces and included
it here as text if there are any.
```
ERROR:superset.connectors.sqla.models:Query SELECT category AS category,
count(DISTINCT id) AS `COUNT_DISTINCT(id)`
FROM mytable
WHERE to_timestamp(cast(created_dts as string), 'yyyyMMddHH') >= '2020-05-22
00:00:00.000000'
AND to_timestamp(cast(created_dts as string), 'yyyyMMddHH') < '2020-05-29
00:00:00.000000'
GROUP BY category
HAVING ((COUNT_DISTINCT(id) > 0))
ORDER BY `COUNT_DISTINCT(id)` DESC
LIMIT 10000 on schema None failed
Traceback (most recent call last):
File
"/home/superset/super/lib64/python3.6/site-packages/superset/connectors/sqla/models.py",
line 1059, in query
df = self.database.get_df(sql, self.schema, mutator)
File
"/home/superset/super/lib64/python3.6/site-packages/superset/models/core.py",
line 362, in get_df
self.db_engine_spec.execute(cursor, sqls[-1])
File
"/home/superset/super/lib64/python3.6/site-packages/superset/db_engine_specs/base.py",
line 841, in execute
cursor.execute(query)
File
"/home/superset/super/lib64/python3.6/site-packages/impala/hiveserver2.py",
line 329, in execute
configuration=configuration)
File
"/home/superset/super/lib64/python3.6/site-packages/impala/hiveserver2.py",
line 375, in execute_async
self._execute_async(op)
File
"/home/superset/super/lib64/python3.6/site-packages/impala/hiveserver2.py",
line 394, in _execute_async
operation_fn()
File
"/home/superset/super/lib64/python3.6/site-packages/impala/hiveserver2.py",
line 372, in op
run_async=True)
File
"/home/superset/super/lib64/python3.6/site-packages/impala/hiveserver2.py",
line 1096, in execute
return self._operation('ExecuteStatement', req)
File
"/home/superset/super/lib64/python3.6/site-packages/impala/hiveserver2.py",
line 1026, in _operation
resp = self._rpc(kind, request)
File
"/home/superset/super/lib64/python3.6/site-packages/impala/hiveserver2.py",
line 994, in _rpc
err_if_rpc_not_ok(response)
File
"/home/superset/super/lib64/python3.6/site-packages/impala/hiveserver2.py",
line 748, in err_if_rpc_not_ok
raise HiveServer2Error(resp.status.errorMessage)
impala.error.HiveServer2Error: IllegalStateException: Failed analysis after
expr substitution.
CAUSED BY: AnalysisException: mfglogs_ficore.count_distinct() unknown
```
- [x] I have reproduced the issue with at least the latest released version
of superset.
- [x] I have checked the issue tracker for the same issue and I haven't
found one similar.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]