lriggs opened a new pull request, #49986:
URL: https://github.com/apache/arrow/pull/49986
### Rationale for this change
Silently allowing duplicate functions that only differ by return type
provides a confusing function registry. Callers do not typically expect the
return type to be part of determining which function to call leading to
ambiguity.
It is possible to register different Gandiva functions with the same alias
and parameters but different return types, resulting in confusing function
overloads.
For example, DATE_EXTRACTION_TRUNCATION_FNS in
[cpp/src/gandiva/function_registry_datetime.cc] was invoked twice with the same
SQL alias lists — once for extract* (returns int64) and once for date_trunc_*
(returns the input date/timestamp type):
```
DATE_EXTRACTION_TRUNCATION_FNS(EXTRACT_SAFE_NULL_IF_NULL, extract)
DATE_EXTRACTION_TRUNCATION_FNS(TRUNCATE_SAFE_NULL_IF_NULL, date_trunc_)
```
As a result the registry contained four entries for day(...) where there
should have been two:
```
int64 day(timestamp) → extractDay_timestamp
int64 day(date) → extractDay_date64
timestamp day(timestamp) → date_trunc_Day_timestamp
date day(date) → date_trunc_Day_date64
```
The same problem existed for every calendar-unit alias: year, month,
quarter, week, weekofyear, yearweek, dayofmonth, hour, minute, second.
Resolution behavior depended on the caller's inferred return type, which is not
the SQL semantics anyone expects from day(timestamp_col).
FunctionRegistry::Add was silently allowing these registrations:
unordered_map::emplace keeps the first entry and discards subsequent ones with
no warning.
### What changes are included in this PR?
#### Diagnostics
Added two ARROW_LOG(ERROR) checks inside FunctionRegistry::Add:
**Duplicate-signature check** — when pc_registry_map_.emplace reports the
entry already existed (same name + params + return type), log the conflict
including both pc_names.
**Alias-collision check** — a new call_shape_map_ keyed on (lower(name),
param_type_ids) (return type excluded) detects the case where the same
name(args) could resolve to two functions with different return types. This is
the check that catches the day family.
These run at registry-construction time, so future regressions surface in
test output rather than being silently shadowed.
#### Macro split
**DATE_EXTRACTION_TRUNCATION_FNS** is split into two macros:
**DATE_EXTRACTION_FNS** — keeps the existing SQL alias lists ({"year"},
{"day", "dayofmonth"}, etc.).
**DATE_TRUNCATION_FNS** — every alias list is {}. The truncate functions are
still reachable through their date_trunc_* base names.
#### Strengthened registry test
Added a third loop to TestNoDuplicates that mirrors the runtime
alias-collision check: keys each registered signature on (lower(name),
param_types) only and fails if two signatures share that key but disagree on
return type. The two pre-existing loops still cover their original cases (same
pc_name + params + ret, and same full signature appearing twice).
#### Cleanup
Removed the standalone NativeFunction("add", ..., date64+int64 → timestamp,
"add_date64_int64"). DATE_ADD_FNS(add, {}) already provides the entry with the
correct → date64 return type matching the precompiled symbol.
Dropped the {"convert_fromutf8"} and {"convert_replaceutf8"} aliases; the
base names already match case-insensitively.
### Are these changes tested?
Registry construction is now silent — no duplicate or alias-collision logs.
TestFunctionRegistry — 6/6 pass, including the strengthened TestNoDuplicates.
ctest -R gandiva — 4/4 binaries pass (gandiva-internals-test,
gandiva-precompiled-test, gandiva-projector-test,
gandiva-projector-test-static).
### Are there any user-facing changes?
The Gandiva function list is different, with some aliases no longer
appearing. Functionality is still available.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]