@rfourquet commented in #16999:

> I don't have any example where this would be an issue, and honestly I 
> didn't think to much about this problem, I just thought it looked safer (so 
> I step back!). I myself have occasionally used x+0 to get a "deep copy" 
> of a BigInt x. But @yuyichao <https://github.com/yuyichao> said anyway it 
> was not the use case he wanted to address, but rather the finalise 
> problem...
> ps: to sort an array, a shallow copy should suffice, no?


I investigated the memory corruption (which apparently has been around ever 
since deepcopy was implemented in Julia), because @chrisraukaukas had run 
into it with his DifferentialEquations.jl package.
He was doing deep copies of vectors of vectors.

What exactly is the finalize problem that is solved by #16999?   There is 
none that I am aware of with my solution, and #16999 does not implement any 
general solution to the problem that deepcopy has, when it uses reflection
to copy types that have memory managed outside of Julia.   That is a 
problem that needs to be addressed.

@yuyichao replied to my comments on GitHub:

I'm concerned about the performance implications of PR #16999 
<https://github.com/JuliaLang/julia/pull/16999>, which makes a copy any 
BigFloats or BigInts when doing deepcopy.

So the suggestion is that if an operation following the defined schematics 
> is too expensive, we should just ignore what the function should do and do 
> nothing instead.

 
That is not at all what I've been saying.  I am saying that if in Julia we 
logically consider BigInts and BigFloats to be immutable (as we do all 
types that are <: Number), then we shouldn't have to make things much 
slower to cater to people who
*might be* breaking that convention by playing with the internals of a Base 
julia type.
Also, in general performance has been taken into consideration when 
deciding on the behavior of Julia.  This isn't even a case where you get 
incorrect behavior by treating BigFloats/BigInts as they already are in 
many places,
as being immutable values, instead of a reference to a mutable value.

He also commented:

The rationale for doing so is that somebody might bypass the BigInt and 
BigFloat types, and directly operate on their GMP/MPFR internals.

No it's not. 

 
It would be useful then to explain just why #16999 is really a better 
solution than #16826 instead of just saying "it's not".

About the performance implications, on my MacBook Pro, doing a deepcopy of 
1 million BigFloats:

Master:

  3.660420 seconds (5.00 M allocations: 159.021 MB, 6.39% gc time)
PR #16286

  0.159178 seconds (3.00 M allocations: 68.650 MB, 7.23% gc time)
PR #16999

 1.670144 seconds (5.00 M allocations: 197.168 MB, 20.20% gc time)


So, I ask, why would one want to make something over an order of magnitude 
slower, using almost 3x the memory, just to protect against the possibility 
that some code is abusing the internals of one of the Base types?

We don't do that in other places for the string types, which have the exact 
same issue.




Reply via email to