On Wednesday, 18 November 2015 at 17:15:38 UTC, Laeeth Isharc wrote:
What do you think about the use of NaN for missing floats? In theory I could imagine wanting to distinguish between an NaN in the source file and a missing value, but in my world I never felt the need for this. For integers and bools, that is different of course.

The julia discussions mention another dataframe implementation, I believe it was for R, where NaN was used. There was some mention of the virtues of their own choice and the problems with NaN. I think use of NaN was a particular encoding of NaN. Other implementations they mentioned used some reserved value in each of the numeric data types to represent NA. In the julia case, I believe what they use is a separate byte vector for each column that holds the NA status. They discussed some other possible enhancements, but I don't know what they implemented. For example, if the single byte holds the NA flag, the cell value can hold additional info ... maybe the reason for the NA. There was also some discussion of having the associated cell hold repeat counts for the NA status, which I suppose meant to repeat it for following cells in the column vector. I'll try to find the discussions and post the link.


Reply via email to