@rfourquet commented in #16999: > I don't have any example where this would be an issue, and honestly I > didn't think to much about this problem, I just thought it looked safer (so > I step back!). I myself have occasionally used x+0 to get a "deep copy" > of a BigInt x. But @yuyichao <https://github.com/yuyichao> said anyway it > was not the use case he wanted to address, but rather the finalise > problem... > ps: to sort an array, a shallow copy should suffice, no?
I investigated the memory corruption (which apparently has been around ever since deepcopy was implemented in Julia), because @chrisraukaukas had run into it with his DifferentialEquations.jl package. He was doing deep copies of vectors of vectors. What exactly is the finalize problem that is solved by #16999? There is none that I am aware of with my solution, and #16999 does not implement any general solution to the problem that deepcopy has, when it uses reflection to copy types that have memory managed outside of Julia. That is a problem that needs to be addressed. @yuyichao replied to my comments on GitHub: I'm concerned about the performance implications of PR #16999 <https://github.com/JuliaLang/julia/pull/16999>, which makes a copy any BigFloats or BigInts when doing deepcopy. So the suggestion is that if an operation following the defined schematics > is too expensive, we should just ignore what the function should do and do > nothing instead. That is not at all what I've been saying. I am saying that if in Julia we logically consider BigInts and BigFloats to be immutable (as we do all types that are <: Number), then we shouldn't have to make things much slower to cater to people who *might be* breaking that convention by playing with the internals of a Base julia type. Also, in general performance has been taken into consideration when deciding on the behavior of Julia. This isn't even a case where you get incorrect behavior by treating BigFloats/BigInts as they already are in many places, as being immutable values, instead of a reference to a mutable value. He also commented: The rationale for doing so is that somebody might bypass the BigInt and BigFloat types, and directly operate on their GMP/MPFR internals. No it's not. It would be useful then to explain just why #16999 is really a better solution than #16826 instead of just saying "it's not". About the performance implications, on my MacBook Pro, doing a deepcopy of 1 million BigFloats: Master: 3.660420 seconds (5.00 M allocations: 159.021 MB, 6.39% gc time) PR #16286 0.159178 seconds (3.00 M allocations: 68.650 MB, 7.23% gc time) PR #16999 1.670144 seconds (5.00 M allocations: 197.168 MB, 20.20% gc time) So, I ask, why would one want to make something over an order of magnitude slower, using almost 3x the memory, just to protect against the possibility that some code is abusing the internals of one of the Base types? We don't do that in other places for the string types, which have the exact same issue.