jayzhan211 opened a new issue, #6746:
URL: https://github.com/apache/arrow-rs/issues/6746

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   <!--
   A clear and concise description of what the problem is. Ex. I'm always 
frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for 
this feature, in addition to  the *what*)
   -->
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   In Arrow-rs, if we want to get the minute of the timestamp, 
`timestamp_s_to_datetime` is called to get the `NaiveDateTime` and `minute()` 
is called to get the  final answer.
   
   For example, given a timestamp something like `1599563412`, we will convert 
it to `2020-09-08T12:10:12.123456780` in chrono's `NaiveDateTime` format. Then 
`minute` is called to get `10`.
   
   However, looks to the function in chrono, even if we just need 
`NaiveTime::from_num_seconds_from_midnight_opt` for computing the `minute`, 
`NaiveTime::from_num_days_from_ce_opt` is still called which ideally we should 
ignore it.
   
   ```rust
       #[inline]
       #[must_use]
       pub const fn from_timestamp(secs: i64, nsecs: u32) -> Option<Self> {
           let days = secs.div_euclid(86_400) + UNIX_EPOCH_DAY;
           let secs = secs.rem_euclid(86_400);
           if days < i32::MIN as i64 || days > i32::MAX as i64 {
               return None;
           }
           let date = try_opt!(NaiveDate::from_num_days_from_ce_opt(days as 
i32));
           let time = 
try_opt!(NaiveTime::from_num_seconds_from_midnight_opt(secs as u32, nsecs));
           Some(date.and_time(time).and_utc())
       }
   ```
   
   I propose we upstream chrono crate and find a way to get `date` and `time` 
separately given `secs: i64, nsecs: u32`. Then we call corresponding smaller 
function based on what we need to minimum the cost.
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   
   This is the flamegraph for clickbench Q18 in datafusion, we can see the 
`from_num_days_from_ce_opt` spent much time for `date_part` function. But the 
query care only about `minute` so the compute of `from_num_days_from_ce_opt` is 
useless.
   <img width="1687" alt="Screenshot 2024-11-18 at 5 40 45 PM" 
src="https://github.com/user-attachments/assets/aa3f07d2-0bd8-4e34-b746-cc93e8e2107b";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to