[GitHub] [incubator-superset] betodealmeida opened a new issue #8225: Pandas casting int64 to float64, misrepresenting value

GitBox Sat, 14 Sep 2019 13:12:29 -0700

betodealmeida opened a new issue #8225: Pandas casting int64 to float64, 
misrepresenting value
URL: https://github.com/apache/incubator-superset/issues/8225
 
 
   I have the following data being returned by Presto (single column, 6 rows):
   
   ```
   [(None,), (1239162456494753670,), (None,), (None,), (None,), (None,)
   ```
   
   Due to the missing data (`None`), Pandas infers the type as `float64`, 
converting the value to a wrong id:
   
   ```python
   >>> column_names = ['organization_lyft_id']
   >>> data = [(None,), (1239162456494753670,), (None,), (None,), (None,), 
(None,)]
   >>> df = pd.DataFrame(list(data), columns=column_names).infer_objects()  # 
SupersetDataFrame
   >>> print(df)
   >>> print(df.dtypes)
      organization_lyft_id
   0                   NaN
   1          1.239162e+18
   2                   NaN
   3                   NaN
   4                   NaN
   5                   NaN
   organization_lyft_id    float64
   dtype: object
   ```
   
   The number then shows up as `1239162456494753800` in SQL Lab.
   
   Here's the Pandas documentation on this:
   
   > ... pandas primarily uses NaN to represent missing data. Because NaN is a 
float, this forces an array of integers with any missing values to become 
floating point. In some cases, this may not matter much. But if your integer 
column is, say, an identifier, casting to float can be problematic. **Some 
integers cannot even be represented as floating point numbers.** (emphasis mine)
   
   Note that if the missing data is filtered the value is inferred as an int64, 
and it shows up correctly in SQL Lab.
   
   The solution is to pass a `dtypes` argument when creating the Pandas data 
frame, built from the cursor description. I'm working on a fix for this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-superset] betodealmeida opened a new issue #8225: Pandas casting int64 to float64, misrepresenting value

Reply via email to