All,

I’m working to integrate the historic usage of SAS missing values for IEEE 
doubles into our SAS Viya Parquet integration.  SAS writes a NAN to represent 
floating-point doubles that are “missing,” i.e. NULL in more general data 
management terms.

Of course SAS’ goal is to create .parquet files that are universally readable.  
Therefore, it appears that the SAS Parquet writer(s) will NOT be able to write 
the usual NAN to represent “missing,” because doing so will cause a floating 
point exception for other readers.

Based on the Parquet doc at:  https://parquet.apache.org/documentation/latest/ 
and by examining code, I understand that Parquet NULL values are indicated by 
setting 0x000 at the definition level vector offset corresponding to each NULL 
column offset value.

Conversely, It appears that the per-column, per page definition level data is 
never written when required is not specified for the column schema.

Is my understanding and Parquet terminology correct here?

Thanks,

Brian

Reply via email to