john-bodley opened a new pull request #4905: [wip][missing values] Removing 
replacing missing values
URL: https://github.com/apache/incubator-superset/pull/4905
 
 
   Apologies for not having full context of this code but from a numerical 
standpoint replacing [missing 
values](https://pandas.pydata.org/pandas-docs/stable/missing_data.html) with 
zero (or other values) is rarely ever a good idea as this leads to inaccuracies 
which surely violates the core tenant of a data analysis tool. Note Pandas 
(implicitly) and Numpy (explicitly) correctly handle missing values, e.g.  
[mean](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html)
 and 
[nanmean](https://docs.scipy.org/doc/numpy/reference/generated/numpy.nanmean.html)
 respectively. 
   
   This PR is still WIP as I've only remedied a couple of the visualization 
types and still to add a number of unit tests to ensure numerical correctness 
with missing values. I felt there was merit in sharing this now in order for me 
to better understand the context of replacing missing values and potential 
corner cases I need to be aware of.
   
   For context here's a few examples were replacing missing values with `0` 
leads to incorrect results:
    
   **Time-series (current):**
   
   <img width="992" alt="screen shot 2018-04-28 at 5 14 28 pm" 
src="https://user-images.githubusercontent.com/4567245/39402261-4555abc0-4b0f-11e8-85cc-0cfeae44a2f0.png";>
   
   **Time-series (proposed):**
   
   <img width="1011" alt="screen shot 2018-04-28 at 5 14 56 pm" 
src="https://user-images.githubusercontent.com/4567245/39402260-4530e0ce-4b0f-11e8-86b0-f72024f19749.png";>
   
   **Box-plot (current):**
   
   <img width="1017" alt="screen shot 2018-04-28 at 5 09 27 pm" 
src="https://user-images.githubusercontent.com/4567245/39402263-4589f97a-4b0f-11e8-9989-5a81ba5c723d.png";>
   
   **Box-plot (proposed):**
   
   <img width="1009" alt="screen shot 2018-04-28 at 5 09 54 pm" 
src="https://user-images.githubusercontent.com/4567245/39402262-456f78ac-4b0f-11e8-8b97-4df2f71e8982.png";>
   
   Closes https://github.com/apache/incubator-superset/issues/3603   
   
   to: @jeffreythewang @mistercrunch @williaster @xrmx  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to