Much has been already said on this topic. The Array(...) interface was kind of meant to be low-level for the user of scientific computing, only to be used when they know what they are doing. You get the raw uninitialized memory as fast as possible.
The user-facing interface was always an array constructor - zeros(), ones(), rand(), etc. Some of this is because of our past experience coming from a matlab/R-like world. As Julia has become more popular, we have realized that those not coming from matlab/R end up using all the possible constructors. While this has raised a variety of issues, I'd like to say that this will not get sorted out satisfactorily before the 0.4 release. For a class that may be taught soon, the thing to do would be to use the zeros/ones/rand constructors to construct arrays, instead of Array(), which currently is more for a package developer. I understand that Array() is a much better name as Stefan points out, but zeros() is not too terrible - it at least clearly tells the user that they get zeroed out arrays. While we have other "features" that can lead to unsafe code (ccall, @inbounds), none of these are things one is likely to run into while learning the language. -viral On Tuesday, November 25, 2014 1:00:10 AM UTC+5:30, Ronald L. Rivest wrote: > > Regarding initialization: > > -- I'm toying with the idea of recommending Julia for an introductory > programming > class (rather than Python). > > -- For this purpose, the language should not have hazards that catch > the unwary. > > -- Not initializing storage is definitely a hazard. With uninitialized > storage, a > program may run fine one day, and fail mysteriously the next, > depending on > the contents of memory. This is about predictability, reliability, > dependability, > and correctness. > > -- I would favor a solution like > A = Array(Int64,n) -- fills with zeros > A = Array(Int64,n,fill=1) -- to fill with ones > A = Array(Int64,n,fill=None) -- for an uninitialized array > so that the *default* is an initialized array, but the speed geeks > can get what they want. > > Cheers, > Ron > > On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote: >> >> If we can make allocating zeroed arrays faster that's great, but unless >> we can close the performance gap all the way and eliminate the need to >> allocated uninitialized arrays altogether, this proposal is just a rename – >> Unchecked.Array >> plays the exact same role as the current Array constructor. It's unclear >> that this would even address the original concern since it still *allows* >> uninitialized allocation of arrays. This rename would just force people who >> have used Array correctly in code that cares about being as efficient as >> possible even for very large arrays to change their code and use >> Unchecked.Array instead. >> >> On Nov 24, 2014, at 1:36 PM, Jameson Nash <[email protected]> wrote: >> >> I think that Rivest’s question may be a good reason to rethink the >> initialization of structs and offer the explicit guarantee that all >> unassigned elements will be initialized to 0 (and not just the jl_value_t >> pointers). I would argue that the current behavior resulted more from a >> desire to avoid clearing the array twice (if the user is about to call >> fill, zeros, ones, +, etc.) than an intentional, casual exposure of >> uninitialized memory. >> >> A random array of integers is also a security concern if an attacker can >> extract some other information (with some probability) about the state of >> the program. Julia is not hardened by design, so you can’t safely run an >> unknown code fragment, but you still might have an unintended memory >> exposure in a client-facing app. While zero’ing memory doesn’t prevent the >> user from simply reusing a memory buffer in a security-unaware fashion >> (rather than consistently allocating a new one for each use), it’s not >> clear to me that the performance penalty would be all that noticeable for >> map Array(X) to zero(X), and only providing an internal constructor for >> grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from >> #8227) >> >> On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski >> [email protected] <http://mailto:[email protected]> >> wrote: >> >> There are two rather different issues to consider: >>> >>> 1. Preventing problems due to inadvertent programmer errors. >>> 2. Preventing malicious security attacks. >>> >>> When we initially made this choice, it wasn't clear if 1 would be a big >>> issue but we decided to see how it played out. It hasn't been a problem in >>> practice: once people grok that the Array(T, dims...) constructor gives >>> uninitialized memory and that the standard usage pattern is to call it and >>> then immediately initialize the memory, everything is ok. I can't >>> recall a single situation where someone has had some terrible bug due to >>> uninitialized int/float arrays. >>> >>> Regarding 2, Julia is not intended to be a hardened language for writing >>> highly secure software. It allows all sorts of unsafe actions: pointer >>> arithmetic, direct memory access, calling arbitrary C functions, etc. The >>> future of really secure software seems to be small formally verified >>> kernels written in statically typed languages that communicate with larger >>> unverified systems over restricted channels. Julia might be appropriate for >>> the larger unverified system but certainly not for the trusted kernel. >>> Adding enough verification to Julia to write secure kernels is not >>> inconceivable, but would be a major research effort. The implementation >>> would have to check lots of things, including, of course, ensuring that all >>> arrays are initialized. >>> >>> A couple of other points: >>> >>> Modern OSes protect against data leaking between processes by zeroing >>> pages before a process first accesses them. Thus any data exposed by >>> Array(T, dims...) comes from the same process and is not a security leak. >>> >>> An uninitialized array of, say, integers is not in itself a security >>> concern – the issue is what you do with those integers. The classic >>> security hole is to use a "random" value from uninitialized memory to >>> access other memory by using it to index into an array or otherwise convert >>> it to a pointer. In the presence of bounds checking, however, this isn't >>> actually a big concern since you will still either get a bounds error or a >>> valid array value – not a meaningful one, of course, but still just a value. >>> >>> Writing programs that are secure against malicious attacks is a hard, >>> unsolved problem. So is doing efficient, productive high-level numerical >>> programming. Trying to solve both problems at the same time seems like a >>> recipe for failing at both. >>> >>> On Nov 24, 2014, at 11:43 AM, David Smith <[email protected]> wrote: >>> >>> Some ideas: >>> >>> Is there a way to return an error for accesses before at least one >>> assignment in bits types? I.e. when the object is created uninitialized it >>> is marked "dirty" and only after assignment of some user values can it be >>> "cleanly" accessed? >>> >>> Can Julia provide a thin memory management layer that grabs memory from >>> the OS first, zeroes it, and then gives it to the user upon initial >>> allocation? After gc+reallocation it doesn't need to be zeroed again, >>> unless the next allocation is larger than anything previous, at which time >>> Julia grabs more memory, sanitizes it, and hands it off. >>> >>> On Monday, November 24, 2014 2:48:05 AM UTC-6, Mauro wrote: >>>> >>>> Pointer types will initialise to undef and any operation on them fails: >>>> julia> a = Array(ASCIIString, 5); >>>> >>>> julia> a[1] >>>> ERROR: access to undefined reference >>>> in getindex at array.jl:246 >>>> >>>> But you're right, for bits-types this is not an error an will just >>>> return whatever was there before. I think the reason this will stay >>>> that way is that Julia is a numerics oriented language. Thus you many >>>> wanna create a 1GB array of Float64 and then fill it with something as >>>> opposed to first fill it with zeros and then fill it with something. >>>> See: >>>> >>>> julia> @time b = Array(Float64, 10^9); >>>> elapsed time: 0.029523638 seconds (8000000144 bytes allocated) >>>> >>>> julia> @time c = zeros(Float64, 10^9); >>>> elapsed time: 0.835062841 seconds (8000000168 bytes allocated) >>>> >>>> You can argue that the time gain isn't worth the risk but I suspect >>>> that >>>> others may feel different. >>>> >>>> On Mon, 2014-11-24 at 09:28, Ronald L. Rivest <[email protected]> >>>> wrote: >>>> > I am just learning Julia... >>>> > >>>> > I was quite shocked today to learn that Julia does *not* >>>> > initialize allocated storage (e.g. to 0 or some default value). >>>> > E.g. the code >>>> > A = Array(Int64,5) >>>> > println(A[1]) >>>> > has unpredictable behavior, may disclose information from >>>> > other modules, etc. >>>> > >>>> > This is really quite unacceptable in a modern programming >>>> > language; it is as bad as not checking array reads for out-of-bounds >>>> > indices. >>>> > >>>> > Google for "uninitialized security" to find numerous instances >>>> > of security violations and unreliability problems caused by the >>>> > use of uninitialized variables, and numerous security advisories >>>> > warning of problems caused by the (perhaps inadvertent) use >>>> > of uninitialized variables. >>>> > >>>> > You can't design a programming language today under the naive >>>> > assumption that code in that language won't be used in highly >>>> > critical applications or won't be under adversarial attack. >>>> > >>>> > You can't reasonably ask all programmers to properly initialize >>>> > their allocated storage manually any more than you can ask them >>>> > to test all indices before accessing an array manually; these are >>>> > things that a high-level language should do for you. >>>> > >>>> > The default non-initialization of allocated storage is a >>>> > mis-feature that should absolutely be fixed. >>>> > >>>> > There is no efficiency argument here in favor of uninitialized >>>> storage >>>> > that can outweigh the security and reliability disadvantages... >>>> > >>>> > Cheers, >>>> > Ron Rivest >>>> >>>> >> >>
