llama90 commented on PR #40761:
URL: https://github.com/apache/arrow/pull/40761#issuecomment-2115139836
From your description and the mention by @pitrou, I understood the following:
> There is an is_nested function in C++, perhaps we can use it instead of
relying on hand-maintained constants on the Python side? Same for some of the
other predicates actually (I see that _INTERVAL_TYPES is only holding a single
kind of interval?)
- For the `_${TYPE}_TYPES`, only replace with C++ validation functions where
possible.
From a straightforward view, the cases that can be replaced with C++
functions are as follows:
- ✅ `_SIGNED_INTEGER_TYPES` = `constexpr bool is_signed_integer(Type::type
type_id)`
- ✅ `_UNSIGNED_INTEGER_TYPES` = `constexpr bool
is_unsigned_integer(Type::type type_id)`
- ✅ `_INTEGER_TYPES` = `constexpr bool is_integer(Type::type type_id)`
- ✅ `_FLOATING_TYPES` = `constexpr bool is_floating(Type::type type_id)`
- ✅ `_DECIMAL_TYPES` = `constexpr bool is_decimal(Type::type type_id)`
- ✅ `_DATE_TYPES` = `constexpr bool is_date(Type::type type_id)`
- ✅ `_TIME_TYPES` = `constexpr bool is_time(Type::type type_id)`
- `_INTERVAL_TYPES`
- `_TEMPORAL_TYPES`
- ✅ `_UNION_TYPES` = `constexpr bool is_union(Type::type type_id)`
- `_NESTED_TYPES`
In this case, do you intend to only replace the functions marked with ✅? Or
are you considering modifying the C++ functions to align with the Python
implementation as well?
- For `is_interval`, the C++ code also checks for `INTERVAL_MONTHS` and
`INTERVAL_DAY_TIME`.
- For `is_temporal`, the C++ code does not check for `DURATION`.
- For `is_nested`, the C++ code also checks for `RUN_END_ENCODED`.
<details><summary>is_interval, is_temporal, and is_nested</summary>
```cpp
/// \brief Check for an interval type
///
/// \param[in] type_id the type-id to check
/// \return whether type-id is an interval type one
constexpr bool is_interval(Type::type type_id) {
switch (type_id) {
case Type::INTERVAL_MONTHS:
case Type::INTERVAL_DAY_TIME:
case Type::INTERVAL_MONTH_DAY_NANO:
return true;
default:
break;
}
return false;
}
/// \brief Check for a temporal type
///
/// \param[in] type_id the type-id to check
/// \return whether type-id is a temporal type one
constexpr bool is_temporal(Type::type type_id) {
switch (type_id) {
case Type::DATE32:
case Type::DATE64:
case Type::TIME32:
case Type::TIME64:
case Type::TIMESTAMP:
return true;
default:
break;
}
return false;
}
/// \brief Check for a nested type
///
/// \param[in] type_id the type-id to check
/// \return whether type-id is a nested type one
constexpr bool is_nested(Type::type type_id) {
switch (type_id) {
case Type::LIST:
case Type::LARGE_LIST:
case Type::LIST_VIEW:
case Type::LARGE_LIST_VIEW:
case Type::FIXED_SIZE_LIST:
case Type::MAP:
case Type::STRUCT:
case Type::SPARSE_UNION:
case Type::DENSE_UNION:
case Type::RUN_END_ENCODED:
return true;
default:
break;
}
return false;
}
```
</details>
Moreover, while it is certainly easier to simply apply what you have
mentioned, from a long-term perspective, exporting C++ functions to replace
equivalent Python code raises questions about the rationale for maintaining
them separately. I would like to seek further opinions on this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]