palday opened a new pull request, #482:
URL: https://github.com/apache/arrow-julia/pull/482
When looking at `@time_imports` on a transitive reverse dependency, I saw
that TimeZones adds a substantial time to a package that doesn't need that part
of Arrow. So I took a stab at moving time zone support to an extension. I've
done this in a way that should still be Julia pre 1.9 compatible. There is one
slightly hacky thing: the extension mechanism is based primarily on the idea of
adding additional methods to existing functions and not for defining additional
types. As such, it's a little bit weird (`setglobal!` in 1.9) to get types
defined in an extension available at the top-level module again.
This is almost assuredly a breaking change because deserialization behavior
will change when TimeZones isn't loaded either directly or through some other
package. For serialization, I feel like it's less of an issue because if you've
writing `ZonedDateTime` to disk, then you've already got TimeZones loaded.
Still, I thought this was a nice experiment to show a potential speed up. I'm
opening this PR as the starting point of a discussion, not because I expect it
to be merged immediately.
# current main
<details>
```julia
julia> @time @time_imports using Arrow
1.7 ms LoggingExtras
2.7 ms DataAPI
1.3 ms DataValueInterfaces
1.1 ms IteratorInterfaceExtensions
1.1 ms TableTraits
39.3 ms Tables
22.9 ms SentinelArrays
12.5 ms PooledArrays
1.0 ms Lz4_jll
2.5 ms TranscodingStreams
3.2 ms CodecLz4
0.7 ms Zstd_jll
1.8 ms CEnum
3.7 ms CodecZstd
0.6 ms Scratch
0.4 ms PrecompileTools
8.8 ms RecipesBase
19.8 ms Parsers
5.6 ms InlineStrings
0.7 ms Compat
0.5 ms Compat → CompatLinearAlgebraExt
0.4 ms ExprTools
0.8 ms Mocking
346.8 ms TimeZones 83.55% compilation time (40% recompilation)
11.2 ms BitIntegers
2.7 ms ConcurrentUtilities
0.8 ms EnumX
3.3 ms ArrowTypes
22.5 ms Arrow
0.606927 seconds (1.62 M allocations: 100.383 MiB, 4.84% gc time, 56.60%
compilation time: 39% of which was recompilation)
```
</details>
# as an extension
<details>
```julia
julia> @time @time_imports using Arrow
1.5 ms LoggingExtras
0.8 ms DataAPI
0.4 ms DataValueInterfaces
0.4 ms IteratorInterfaceExtensions
0.4 ms TableTraits
20.7 ms Tables
17.5 ms SentinelArrays
12.2 ms PooledArrays
0.8 ms Lz4_jll
2.0 ms TranscodingStreams
3.3 ms CodecLz4
1.0 ms Zstd_jll
3.0 ms CEnum
3.1 ms CodecZstd
12.2 ms BitIntegers
2.1 ms ConcurrentUtilities
0.6 ms EnumX
2.8 ms ArrowTypes
20.7 ms Arrow
0.130832 seconds (201.42 k allocations: 13.516 MiB, 6.01% compilation time)
julia> @time @time_imports using TimeZones
0.6 ms Scratch
0.4 ms PrecompileTools
9.2 ms RecipesBase
19.8 ms Parsers
5.2 ms InlineStrings
0.9 ms Compat
1.2 ms Compat → CompatLinearAlgebraExt
0.5 ms ExprTools
1.2 ms Mocking
341.1 ms TimeZones 82.49% compilation time (44% recompilation)
156.7 ms Arrow → ArrowTimeZonesExt
0.612256 seconds (1.42 M allocations: 87.153 MiB, 2.65% gc time, 54.20%
compilation time: 43% of which was recompilation)
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]