All, I’m working to integrate the historic usage of SAS missing values for IEEE doubles into our SAS Viya Parquet integration. SAS writes a NAN to represent floating-point doubles that are “missing,” i.e. NULL in more general data management terms.
Of course SAS’ goal is to create .parquet files that are universally readable. Therefore, it appears that the SAS Parquet writer(s) will NOT be able to write the usual NAN to represent “missing,” because doing so will cause a floating point exception for other readers. Based on the Parquet doc at: https://parquet.apache.org/documentation/latest/ and by examining code, I understand that Parquet NULL values are indicated by setting 0x000 at the definition level vector offset corresponding to each NULL column offset value. Conversely, It appears that the per-column, per page definition level data is never written when required is not specified for the column schema. Is my understanding and Parquet terminology correct here? Thanks, Brian
