peterxcli commented on PR #20358: URL: https://github.com/apache/datafusion/pull/20358#issuecomment-3910433621
> > Warning that spark has `spark.sql.mapKeyDedupPolicy` > > ``` > > spark.sql.mapKeyDedupPolicy | EXCEPTION | The policy to deduplicate map keys in builtin function: CreateMap, MapFromArrays, MapFromEntries, StringToMap, MapConcat and TransformKeys. When EXCEPTION, the query fails if duplicated map keys are detected. When LAST_WIN, the map key that is inserted at last takes precedence. | 3.0.0 > > ``` > > > > > > > > > > > > > > > > > > > > > > > > [spark.apache.org/docs/latest/configuration.html](https://spark.apache.org/docs/latest/configuration.html) > > this is a good point yeah, thanks for pointing out. That's a big reason why I try to reuse the `map_from_keys_values_offsets_nulls` from `datafusion/spark/src/function/map/utils.rs`, which handle the spark policy things already, https://github.com/apache/datafusion/blob/ffcc7e3af8cfccb0c0705de2112d8277b28114fd/datafusion/spark/src/function/map/utils.rs#L114-L120 > For now, configurable functions are not supported by Datafusion. So more permissive `LAST_WIN` option is used in this implementation (instead of `EXCEPTION`) `EXCEPTION` behaviour can still be achieved externally in cost of performance: `when(array_length(array_distinct(keys)) == array_length(keys), constructed_map)` `.otherwise(raise_error("duplicate keys occurred during map construction"))` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
