viirya commented on PR #2570: URL: https://github.com/apache/arrow-rs/pull/2570#issuecomment-1226022584
I'm open to the feature flag naming. `sql_compliant` is used is because this is to follow SQL semantics regarding NaN handling. Maybe it sounds too broad as currently it only deals with NaN specially. > Have we considered just making this the default behaviour? I'm not sure that this would make sense for other usecases other than SQL. > Note there is no SQL standard for NaN, and different engines may have different behaviors. For instance, in PostgresSQL NaN is only considered equal to NaN in sort, but not other cases. Therefore, I think we should clearly document the behavior introduced here. I think that PostgresSQL also treats NaN equal to NaN, as Spark does. Quote from https://www.postgresql.org/docs/current/datatype-numeric.html: > In order to allow numeric values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values. Just did a test `SELECT double precision 'NaN' = double precision 'NaN';` in PostgresSQL, it returns true. I agree that we should document it clearly. I will update the document. > Similar question as @tustvold : should we aim to make all the compute kernels SQL compliant? in that case we should no longer need a flag like this. As I answered above for @tustvold's question, I think we need both behaviors of compute kernels. For non-SQL usecases, current behavior is correct. But for SQL semantics, NaN not equal to NaN will cause practical issue when processing data, so we need different behavior with it. That's said, I think that we cannot just change all compute kernels to follow SQL semantics. > Also, could we have some tests for this too? I have some tests for this already. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
