Regarding initialization:
-- I'm toying with the idea of recommending Julia for an introductory
programming
class (rather than Python).
-- For this purpose, the language should not have hazards that catch the
unwary.
-- Not initializing storage is definitely a hazard. With uninitialized
storage, a
program may run fine one day, and fail mysteriously the next,
depending on
the contents of memory. This is about predictability, reliability,
dependability,
and correctness.
-- I would favor a solution like
A = Array(Int64,n) -- fills with zeros
A = Array(Int64,n,fill=1) -- to fill with ones
A = Array(Int64,n,fill=None) -- for an uninitialized array
so that the *default* is an initialized array, but the speed geeks
can get what they want.
Cheers,
Ron
On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:
>
> If we can make allocating zeroed arrays faster that's great, but unless we
> can close the performance gap all the way and eliminate the need to
> allocated uninitialized arrays altogether, this proposal is just a rename –
> Unchecked.Array
> plays the exact same role as the current Array constructor. It's unclear
> that this would even address the original concern since it still *allows*
> uninitialized allocation of arrays. This rename would just force people who
> have used Array correctly in code that cares about being as efficient as
> possible even for very large arrays to change their code and use
> Unchecked.Array instead.
>
> On Nov 24, 2014, at 1:36 PM, Jameson Nash <[email protected] <javascript:>>
> wrote:
>
> I think that Rivest’s question may be a good reason to rethink the
> initialization of structs and offer the explicit guarantee that all
> unassigned elements will be initialized to 0 (and not just the jl_value_t
> pointers). I would argue that the current behavior resulted more from a
> desire to avoid clearing the array twice (if the user is about to call
> fill, zeros, ones, +, etc.) than an intentional, casual exposure of
> uninitialized memory.
>
> A random array of integers is also a security concern if an attacker can
> extract some other information (with some probability) about the state of
> the program. Julia is not hardened by design, so you can’t safely run an
> unknown code fragment, but you still might have an unintended memory
> exposure in a client-facing app. While zero’ing memory doesn’t prevent the
> user from simply reusing a memory buffer in a security-unaware fashion
> (rather than consistently allocating a new one for each use), it’s not
> clear to me that the performance penalty would be all that noticeable for
> map Array(X) to zero(X), and only providing an internal constructor for
> grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from #8227)
>
> On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski
> [email protected] <http://mailto:[email protected]>
> wrote:
>
> There are two rather different issues to consider:
>>
>> 1. Preventing problems due to inadvertent programmer errors.
>> 2. Preventing malicious security attacks.
>>
>> When we initially made this choice, it wasn't clear if 1 would be a big
>> issue but we decided to see how it played out. It hasn't been a problem in
>> practice: once people grok that the Array(T, dims...) constructor gives
>> uninitialized memory and that the standard usage pattern is to call it and
>> then immediately initialize the memory, everything is ok. I can't recall
>> a single situation where someone has had some terrible bug due to
>> uninitialized int/float arrays.
>>
>> Regarding 2, Julia is not intended to be a hardened language for writing
>> highly secure software. It allows all sorts of unsafe actions: pointer
>> arithmetic, direct memory access, calling arbitrary C functions, etc. The
>> future of really secure software seems to be small formally verified
>> kernels written in statically typed languages that communicate with larger
>> unverified systems over restricted channels. Julia might be appropriate for
>> the larger unverified system but certainly not for the trusted kernel.
>> Adding enough verification to Julia to write secure kernels is not
>> inconceivable, but would be a major research effort. The implementation
>> would have to check lots of things, including, of course, ensuring that all
>> arrays are initialized.
>>
>> A couple of other points:
>>
>> Modern OSes protect against data leaking between processes by zeroing
>> pages before a process first accesses them. Thus any data exposed by
>> Array(T, dims...) comes from the same process and is not a security leak.
>>
>> An uninitialized array of, say, integers is not in itself a security
>> concern – the issue is what you do with those integers. The classic
>> security hole is to use a "random" value from uninitialized memory to
>> access other memory by using it to index into an array or otherwise convert
>> it to a pointer. In the presence of bounds checking, however, this isn't
>> actually a big concern since you will still either get a bounds error or a
>> valid array value – not a meaningful one, of course, but still just a value.
>>
>> Writing programs that are secure against malicious attacks is a hard,
>> unsolved problem. So is doing efficient, productive high-level numerical
>> programming. Trying to solve both problems at the same time seems like a
>> recipe for failing at both.
>>
>> On Nov 24, 2014, at 11:43 AM, David Smith <[email protected]
>> <javascript:>> wrote:
>>
>> Some ideas:
>>
>> Is there a way to return an error for accesses before at least one
>> assignment in bits types? I.e. when the object is created uninitialized it
>> is marked "dirty" and only after assignment of some user values can it be
>> "cleanly" accessed?
>>
>> Can Julia provide a thin memory management layer that grabs memory from
>> the OS first, zeroes it, and then gives it to the user upon initial
>> allocation? After gc+reallocation it doesn't need to be zeroed again,
>> unless the next allocation is larger than anything previous, at which time
>> Julia grabs more memory, sanitizes it, and hands it off.
>>
>> On Monday, November 24, 2014 2:48:05 AM UTC-6, Mauro wrote:
>>>
>>> Pointer types will initialise to undef and any operation on them fails:
>>> julia> a = Array(ASCIIString, 5);
>>>
>>> julia> a[1]
>>> ERROR: access to undefined reference
>>> in getindex at array.jl:246
>>>
>>> But you're right, for bits-types this is not an error an will just
>>> return whatever was there before. I think the reason this will stay
>>> that way is that Julia is a numerics oriented language. Thus you many
>>> wanna create a 1GB array of Float64 and then fill it with something as
>>> opposed to first fill it with zeros and then fill it with something.
>>> See:
>>>
>>> julia> @time b = Array(Float64, 10^9);
>>> elapsed time: 0.029523638 seconds (8000000144 bytes allocated)
>>>
>>> julia> @time c = zeros(Float64, 10^9);
>>> elapsed time: 0.835062841 seconds (8000000168 bytes allocated)
>>>
>>> You can argue that the time gain isn't worth the risk but I suspect that
>>> others may feel different.
>>>
>>> On Mon, 2014-11-24 at 09:28, Ronald L. Rivest <[email protected]>
>>> wrote:
>>> > I am just learning Julia...
>>> >
>>> > I was quite shocked today to learn that Julia does *not*
>>> > initialize allocated storage (e.g. to 0 or some default value).
>>> > E.g. the code
>>> > A = Array(Int64,5)
>>> > println(A[1])
>>> > has unpredictable behavior, may disclose information from
>>> > other modules, etc.
>>> >
>>> > This is really quite unacceptable in a modern programming
>>> > language; it is as bad as not checking array reads for out-of-bounds
>>> > indices.
>>> >
>>> > Google for "uninitialized security" to find numerous instances
>>> > of security violations and unreliability problems caused by the
>>> > use of uninitialized variables, and numerous security advisories
>>> > warning of problems caused by the (perhaps inadvertent) use
>>> > of uninitialized variables.
>>> >
>>> > You can't design a programming language today under the naive
>>> > assumption that code in that language won't be used in highly
>>> > critical applications or won't be under adversarial attack.
>>> >
>>> > You can't reasonably ask all programmers to properly initialize
>>> > their allocated storage manually any more than you can ask them
>>> > to test all indices before accessing an array manually; these are
>>> > things that a high-level language should do for you.
>>> >
>>> > The default non-initialization of allocated storage is a
>>> > mis-feature that should absolutely be fixed.
>>> >
>>> > There is no efficiency argument here in favor of uninitialized storage
>>> > that can outweigh the security and reliability disadvantages...
>>> >
>>> > Cheers,
>>> > Ron Rivest
>>>
>>>
>
>