[GitHub] minh5 opened a new issue #5308: KeyError when using Redshift

GitBox Thu, 28 Jun 2018 09:27:33 -0700

minh5 opened a new issue #5308: KeyError when using Redshift
URL: https://github.com/apache/incubator-superset/issues/5308
 
 
   I'm trying to create line charts and time series charts with aggregates. In 
this case, I am summing up the column items sales over a date, `day` .  However 
I keep getting this error 
   
   <img width="943" alt="screen shot 2018-06-28 at 12 07 04 pm" 
src="https://user-images.githubusercontent.com/7282400/42046559-237736c6-7acc-11e8-9e51-81b13749ea9c.png";>
   <img width="951" alt="screen shot 2018-06-28 at 12 07 40 pm" 
src="https://user-images.githubusercontent.com/7282400/42046572-299d7ed4-7acc-11e8-877e-d178b4402591.png";>
   
   **- [ x] I have checked the superset logs for python stacktraces and 
included it here as text if any**
   
   ```
   2018-06-28 12:07:29,604:INFO:root:Database.get_sqla_engine(). Masked URL: 
redshift+psycopg2://user:[email protected]:5439/testdb
   2018-06-28 12:07:30,077:DEBUG:root:[stats_logger] (incr) loaded_from_source
   2018-06-28 12:07:30,077:ERROR:root:u'SUM(itemsales)'
   Traceback (most recent call last):
     File 
"/Users/minhmai/envs/py2/lib/python2.7/site-packages/superset/views/core.py", 
line 1107, in generate_json
       payload = viz_obj.get_payload()
     File 
"/Users/minhmai/envs/py2/lib/python2.7/site-packages/superset/viz.py", line 
329, in get_payload
       payload['data'] = self.get_data(df)
     File 
"/Users/minhmai/envs/py2/lib/python2.7/site-packages/superset/viz.py", line 
580, in get_data
       values=values)
     File 
"/Users/minhmai/envs/py2/lib/python2.7/site-packages/pandas/core/frame.py", 
line 4468, in pivot_table
       margins_name=margins_name)
     File 
"/Users/minhmai/envs/py2/lib/python2.7/site-packages/pandas/core/reshape/pivot.py",
 line 58, in pivot_table
       raise KeyError(i)
   KeyError: u'SUM(itemsales)'
   ```
   A bit of digging saw that the column names become lower case when turned 
into a pandas data frame but the metric name is still capitalized, as shown by 
my logs above. I've set a trace and it's exactly what I expected
   
   ```
   (Pdb) l
   585                          records=pt.to_dict(orient='index'),
   586                          columns=list(pt.columns),
   587                          is_group_by=len(fd.get('groupby')) > 0,
   588                      )
   589                  except:
   590  ->                  import pdb; pdb.post_mortem()
   591
   592
   593          class PivotTableViz(BaseViz):
   594
   595              """A pivot table view, define your rows, columns and 
metrics"""
   (Pdb) values
   [u'SUM(itemsales)']
   (Pdb) df.head()
                   __timestamp  sum(itemsales)
   0 2018-06-15 00:00:00+00:00             0.0
   1 2018-06-11 00:00:00+00:00             0.0
   2 2018-06-13 00:00:00+00:00             0.0
   3 2018-06-09 00:00:00+00:00             0.0
   4 2018-06-07 00:00:00+00:00             0.0
   (Pdb) self.metrics
   [u'SUM(itemsales)']
   (Pdb) df.columns
   Index([u'__timestamp', u'sum(itemsales)'], dtype='object')
   ```
   The error occurred at line 578
   
   ```
               pt = df.pivot_table(
                   index=DTTM_ALIAS,
                   columns=columns,
                   values=values)
   ```
   Make sure these boxes are checked before submitting your issue - thank you!
   
   **- [ x] I have reproduced the issue with at least the latest released 
version of superset**
   **- [ x] I have checked the issue tracker for the same issue and I haven't 
found one similar**
   
   
   ### Superset version
   superset==0.25.6
   
   ### Expected results
   I expect either the metrics to be all lower cased or that the column names 
of the results dataframe to match the form as the aggregate query
   
   ### Actual results
   The data frame has their column name lower cased and the metrics still 
retain the formatting.
   
   ### Steps to reproduce
   This is used on test data with a random numeric generator. I have seen this 
error in every case where I am using the SUM aggregation. The database is on 
Redshift and I have confirmed that I am using pandas==0.22.0.
   
   I can push a fix to make the metrics lower cased or have the column name of 
the data frame match the metric but I'm not sure if that is the best way to 
approach this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] minh5 opened a new issue #5308: KeyError when using Redshift

Reply via email to