Ah, I see it’s been discussed and even documented. FWIW, documenting this
behavior in the pointer function would be useful for newbies like myself. I
agree with Stefan that the two argument pointer function should be deprecated
as it’s C-like behavior is inconsistent. If Julia pointer arithmetic is byte
based, that’s a reasonable convention that just needs to be understood, like
1-based indexing or FORTRAN array layout.
Sprinkling a few sizeof(T) in your code when you’re mucking about with pointers
anyway is a small price to pay. With C conventions, you’d do just as much
mucking about with convert(Ptr{UInt8},...).
On March 25, 2015 at 11:05:00 AM, Milan Bouchet-Valat ([email protected]) wrote:
Le mercredi 25 mars 2015 à 07:55 -0700, Matt Bauman a écrit :
See https://github.com/JuliaLang/julia/issues/6219#issuecomment-38117402
This looks like a case where, as discussed for string indexing, writing
something like p + 5bytes could make sense. Then the default behavior could
follow the more natural C convention, yet you'd never have to write things like
p + size/sizeof(T) (to quote Jeff's remark on the issue).
Regards
On Wednesday, March 25, 2015 at 9:58:46 AM UTC-4, Sebastian Good wrote:
The benefit of the semantics of the two argument pointer function is that it
preserves intuitive pointer arithmetic. As a new (yet happy!) Julia programmer,
I certainly don’t know what the deprecation implications of changing pointer
arithmetic are (vast, sadly, I imagine), but their behavior certainly violated
my “principle of least astonishment” when I found they worked by bytes, not by
Ts. That is, instead of base/pointer.jl:64 (and friends) looking like
+(x::Ptr, y::Integer) = oftype(x, (UInt(x) + (y % UInt) % UInt))
I would expect them to look like
+{T}(x::Ptr{T}, y::Integer) = oftype(x, (UInt(x) + sizeof(T)*(y % UInt) % UInt))
To more closely follow the principle of pointer arithmetic long ago established
by C. The type specialization would make these just as fast. For this to work
with arrays safely, you’d have to guarantee that dense arrays had no padding
between elements. Since C requires this to the be the case, it seems we’re on
safe ground?
On March 25, 2015 at 9:07:40 AM, Stefan Karpinski ([email protected]) wrote:
Given the performance difference and the different behavior, I'm tempted to
just deprecate the two-argument form of pointer.
On Wed, Mar 25, 2015 at 12:53 PM, Sebastian Good
<[email protected]> wrote:
I guess what I find most confusing is that there would be a difference, since
adding 1 to a pointer only adds one byte, not one element size.
> p1 = pointer(zeros(UInt64));
Ptr{UInt64} @0x000000010b28c360
> p1 + 1
Ptr{UInt64} @0x000000010b28c361
I would have expected the latter to end in 68. the two argument pointer
function gets this “right”.
> a=zeros(UInt64);
> pointer(a,1)
Ptr{Int64} @0x000000010b9c72e0
> pointer(a,2)
Ptr{Int64} @0x000000010b9c72e8
I can see arguments multiple ways, but when I’m given a strongly typed pointer
(Ptr{T}), I would expect it to participate in arithmetic in increments of
sizeof(T).
On March 25, 2015 at 6:36:37 AM, Stefan Karpinski ([email protected]) wrote:
That does seem to be the issue. It's tricky to fix since you can't evaluate
sizeof(Ptr) unless the condition is true.
On Tue, Mar 24, 2015 at 7:13 PM, Stefan Karpinski <[email protected]> wrote:
There's a branch in eltype, which is probably causing this difference.
On Tue, Mar 24, 2015 at 7:00 PM, Sebastian Good
<[email protected]> wrote:
Yep, that’s done it. The only difference I can see in the code I wrote before
and this code is that previously I had
convert(Ptr{T}, pointer(raw, byte_number))
whereas here we have
convert(Ptr{T}, pointer(raw) + byte_number - 1)
The former construction seems to emit a call to a Julia-intrinsic function,
while the latter executes the more expected simple machine loads. Is there a
subtle difference between the two calls to pointer?
Thanks all for your help!
On March 24, 2015 at 12:19:00 PM, Matt Bauman ([email protected]) wrote:
(The key is to ensure that the method gets specialized for different types with
the parametric `::Type{T}` in the signature instead of `T::DataType`).
On Tuesday, March 24, 2015 at 12:10:59 PM UTC-4, Stefan Karpinski wrote:
This seems like it works fine to me (on both 0.3 and 0.4):
immutable Test
x::Float32
y::Int64
z::Int8
end
julia> a = [Test(1,2,3)]
1-element Array{Test,1}:
Test(1.0f0,2,3)
julia> b = copy(reinterpret(UInt8, a))
24-element Array{UInt8,1}:
0x00
0x00
0x80
0x3f
0x03
0x00
0x00
0x00
0x02
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x03
0xe0
0x82
0x10
0x01
0x00
0x00
0x00
julia> prim_read{T}(::Type{T}, data::Array{Uint8,1}, offset::Int) =
unsafe_load(convert(Ptr{T}, pointer(data) + offset))
prim_read (generic function with 1 method)
julia> prim_read(Test, b, 0)
Test(1.0f0,2,3)
julia> @code_native prim_read(Test, b, 0)
.section __TEXT,__text,regular,pure_instructions
Filename: none
Source line: 1
push RBP
mov RBP, RSP
Source line: 1
mov RCX, QWORD PTR [RSI + 8]
vmovss XMM0, DWORD PTR [RCX + RDX]
mov RAX, QWORD PTR [RCX + RDX + 8]
mov DL, BYTE PTR [RCX + RDX + 16]
pop RBP
ret
On Tue, Mar 24, 2015 at 5:04 PM, Simon Danisch <[email protected]> wrote:
There is a high chance that I simply don't understand llvmcall well enough,
though ;)
Am Montag, 23. März 2015 20:20:09 UTC+1 schrieb Sebastian Good:
I'm trying to read some binary formatted data. In C, I would define an
appropriately padded struct and cast away. Is is possible to do something
similar in Julia, though for only one value at a time? Philosophically, I'd
like to approximate the following, for some simple bittypes T (Int32, Float32,
etc.)
T read<T>(char* data, size_t offset) { return *(T*)(data + offset); }
The transliteration of this brain-dead approach results in the following, which
seems to allocate a boxed Pointer object on every invocation. The pointer
function comes with ample warnings about how it shouldn't be used, and I
imagine that it's not polite to the garbage collector.
prim_read{T}(::Type{T}, data::AbstractArray{Uint8, 1}, byte_number) =
unsafe_load(convert(Ptr{T}, pointer(data, byte_number)))
I can reinterpret the whole array, but this will involve a division of the
offset to calculate the new offset relative to the reinterpreted array, and it
allocates an array object.
Is there a better way to simply read the machine word at a particular offset in
a byte array? I would think it should inline to a single assembly instruction
if done right.