rumbin opened a new issue #17042:
URL: https://github.com/apache/superset/issues/17042


   Explore's pill indicator of the number of result rows is showing wrong 
numbers for Box Plots.
   Instead of the number of rows returned from the DB query it displays the 
number of _aggregated_ rows.
   
   This way, users have no clue if the row limit kicked in and the box plot is 
based on incomplete data.
   
   Furthermore, the row limit is...
   * not adjustable for this plot
   * not communicated to the user
   * applied without imposing any sort order, thus resulting in a potentially 
arbitrary sample of rows being evaluated by the box plot
   
   #### How to reproduce the bug
   
   1. Use a dataset that has more rows than the configured ROW_LIMIT
   2. Create a box plot on a numeric column, _distribute across_ a column with 
a high cardinality, e.g., the primary key of the dataset.
   3. Run
   4. Observe the number of rows displayed in the indicator pill
   
   ### Expected results
   
   Minimal:
   
   * The actual number of rows returned by the DB query is shown in the 
indicator.
   * If the ROW_LIMIT is hit, the indicator turns red, like it does with all 
the other visualizations.
   
   Ideal, additionally:
   * The row limit is configurable, maybe even disabled, like for the 
histogram, iirc
   * If a row limit is applied, some sensible ORDER BY should be applied in 
order to at least yield deterministic, if incomplete, results.
   
   ### Actual results
   
   1. Only the number of aggregated rows are shown, equalling the number of 
series displayed in the chart.
   2. This number is consistent with the _Data_ table below the chart, which 
also only shows one row per series (box).
   
   #### Screenshots/Screencasts
   
   The row count pill shows only 7 result rows:
   
   
![image](https://user-images.githubusercontent.com/1220356/136619652-ed29183a-4481-4398-a9a6-686e66bcc2b2.png)
   
   
   The _Data_ table lists these aggregated rows of the 7 distinct series:
   
   
![image](https://user-images.githubusercontent.com/1220356/136619826-a9a4fbcb-7202-478c-a67b-6f6e9a28d42f.png)
   
   
   The query applies the configured ROW_LIMIT of 100000:
   
   
![image](https://user-images.githubusercontent.com/1220356/136619868-8291ff57-e3dd-46a3-abfe-dae9612cd997.png)
   
   
   A Big Number that calculates the distinct count of entities that the box 
plot was distributed across proves that the ccardinality is much higher than 
the ROW_LIMIT and therefore the boxplot was based on incomplete data witghout 
the row count pill turning red:
   
   
![image](https://user-images.githubusercontent.com/1220356/136620330-e6f9c0e2-5b7f-45ff-9001-5fcc654252d0.png)
   
   
   
   
   ### Environment
   
   - browser type and version: Chrome 93.0.4577.63
   - superset version: 1.3.1, installed via pip
   - python version: 3.8.11
   - node.js version: v4.6.1
   - feature flags active:
   ```
       "THUMBNAILS": True,
       "ALERT_REPORTS": True,
       "ALERTS_ATTACH_REPORTS": True,
       "SQLLAB_BACKEND_PERSISTENCE": True,
       "ENABLE_TEMPLATE_PROCESSING": True,                                      
                           
       "DASHBOARD_NATIVE_FILTERS": True, 
       "DASHBOARD_CROSS_FILTERS": True,
       "DASHBOARD_NATIVE_FILTERS_SET": True,
       "ENABLE_EXPLORE_DRAG_AND_DROP": True,
       "DASHBOARD_CACHE": True     
   ```
   
   ### Checklist
   
   Make sure to follow these steps before submitting your issue - thank you!
   
   - [ x] I have checked the superset logs for python stacktraces and included 
it here as text if there are any.
   - [ x] I have reproduced the issue with at least the latest released version 
of superset.
   - [ x] I have checked the issue tracker for the same issue and I haven't 
found one similar.
   
   ### Additional context
   
   ´pip freeze´
   
   ```
   aiohttp==3.7.4.post0
   alembic==1.7.3
   amqp==2.6.1
   apache-superset==1.3.1
   apispec==3.3.2
   asn1crypto==1.4.0
   async-timeout==3.0.1
   attrs==21.2.0
   azure-common==1.1.27
   azure-core==1.18.0
   azure-storage-blob==12.9.0
   Babel==2.9.1
   backoff==1.11.1
   billiard==3.6.4.0
   bleach==3.3.1
   boto3==1.18.51
   botocore==1.21.51
   Brotli==1.0.9
   cachelib==0.1.1
   cachetools==4.2.4
   celery==4.4.7
   certifi==2021.5.30
   cffi==1.14.6
   chardet==4.0.0
   charset-normalizer==2.0.6
   click==7.1.2
   cmdstanpy==0.9.68
   colorama==0.4.4
   convertdate==2.3.2
   cron-descriptor==1.2.24
   croniter==1.0.15
   cryptography==3.4.8
   cx-Oracle==8.2.1
   cycler==0.10.0
   Cython==0.29.24
   defusedxml==0.7.1
   deprecation==2.1.0
   dnspython==2.1.0
   elasticsearch==7.13.4
   elasticsearch-dbapi==0.2.6
   email-validator==1.1.3
   ephem==4.1
   et-xmlfile==1.1.0
   Flask==1.1.4
   Flask-AppBuilder==3.3.3
   Flask-Babel==1.0.0
   Flask-Caching==1.10.1
   Flask-Compress==1.10.1
   Flask-JWT-Extended==3.25.1
   Flask-Login==0.4.1
   Flask-Migrate==3.1.0
   Flask-OpenID==1.3.0
   Flask-SQLAlchemy==2.5.1
   flask-talisman==0.8.1
   Flask-WTF==0.14.3
   future==0.18.2
   geographiclib==1.52
   geopy==2.2.0
   gevent==21.8.0
   google-api-core==2.0.1
   google-auth==2.2.1
   google-cloud-bigquery==2.27.1
   google-cloud-core==2.0.0
   google-crc32c==1.2.0
   google-resumable-media==2.0.3
   googleapis-common-protos==1.53.0
   graphlib-backport==1.0.3
   greenlet==1.1.2
   grpcio==1.41.0
   gunicorn==20.0.4
   hdbcli==2.10.13
   holidays==0.10.3
   humanize==3.11.0
   idna==3.2
   importlib-resources==5.2.2
   isodate==0.6.0
   itsdangerous==1.1.0
   Jinja2==2.11.3
   jmespath==0.10.0
   jsonschema==3.2.0
   kiwisolver==1.3.2
   kombu==4.6.11
   korean-lunar-calendar==0.2.1
   LunarCalendar==0.0.9
   Mako==1.1.5
   Markdown==3.3.4
   MarkupSafe==2.0.1
   marshmallow==3.13.0
   marshmallow-enum==1.5.1
   marshmallow-sqlalchemy==0.23.1
   matplotlib==3.4.3
   msgpack==1.0.2
   msrest==0.6.21
   multidict==5.1.0
   numpy==1.21.2
   oauthlib==3.1.1
   openpyxl==3.0.9
   oscrypto==1.2.1
   packaging==21.0
   pandas==1.2.5
   parsedatetime==2.6
   pgsanity==0.2.9
   Pillow==8.3.2
   polyline==1.4.0
   prison==0.2.1
   prophet==1.0
   proto-plus==1.19.2
   protobuf==3.18.0
   psycopg2-binary==2.8.6
   pyarrow==4.0.1
   pyasn1==0.4.8
   pyasn1-modules==0.2.8
   pybigquery==0.10.2
   pycparser==2.20
   pycryptodomex==3.10.4
   pyhdb==0.3.4
   PyJWT==1.7.1
   PyMeeus==0.5.11
   pyOpenSSL==20.0.1
   pyparsing==2.4.7
   pyrsistent==0.18.0
   pystan==2.18.0.0
   python-dateutil==2.8.2
   python-dotenv==0.19.0
   python-geohash==0.8.5
   python-ldap==3.3.1
   python3-openid==3.2.0
   pytz==2021.1
   PyYAML==5.4.1
   redis==3.5.3
   requests==2.26.0
   requests-oauthlib==1.3.0
   rsa==4.7.2
   s3transfer==0.5.0
   selenium==3.141.0
   setuptools-git==1.2
   simplejson==3.17.5
   six==1.16.0
   slackclient==2.5.0
   snowflake-connector-python==2.6.2
   snowflake-sqlalchemy==1.2.4
   SQLAlchemy==1.3.24
   sqlalchemy-hana==0.5.0
   SQLAlchemy-Utils==0.36.8
   sqlparse==0.3.0
   tabulate==0.8.9
   tqdm==4.62.3
   typing-extensions==3.10.0.2
   ujson==4.2.0
   urllib3==1.26.7
   vine==1.3.0
   webencodings==0.5.1
   Werkzeug==1.0.1
   WTForms==2.3.3
   WTForms-JSON==0.3.3
   xlrd==2.0.1
   yarl==1.6.3
   zipp==3.6.0
   zope.event==4.5.0
   zope.interface==5.4.0
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to