Re: [dmd-internals] Regarding deprecation of volatile statements

Walter Bright Wed, 01 Aug 2012 10:27:40 -0700


On 7/31/2012 6:59 PM, Alex Rønne Petersen wrote:

On Wed, Aug 1, 2012 at 2:55 AM, Walter Bright <[email protected]> wrote:

On 7/31/2012 10:02 AM, Alex Rønne Petersen wrote:

On Wed, Jul 25, 2012 at 1:20 AM, Walter Bright <[email protected]>
wrote:

On 7/24/2012 3:18 PM, Alex Rønne Petersen wrote:

On Wed, Jul 25, 2012 at 12:11 AM, Walter Bright <[email protected]>
wrote:

On 7/24/2012 2:53 PM, Alex Rønne Petersen wrote:

But shared can't replace volatile in kernel space. shared means
atomics/memory fences which is not what I want - that would just give
me
unnecessary overhead. I want the proper, standard C semantics of
volatile,


C does not have Standard semantics for volatile. It's a giant mess.

Right, it leaves the exact definition of a volatile access to the
compiler.


Right, that's why it is incorrect to refer to it as "standard" behavior.
Behaviors I've seen include various combinations of:

1. disallowing enregistering
2. preventing folding multiple loads/stores together
3. preventing reordering across expressions with volatiles
4. inserting memory load/store fences

As Martin already said, 1 and 2 are exactly what I need,


Why do you need something not to be enregistered? It's usually loaded into a
register before use, anyway. Also, why would you need 2?

I think there may be a misunderstanding. By enregistering I thought
you meant moving something off the stack and into registers
completely.

That's what it means. But also, I have no idea what problem is addressed by notdisallowing register allocation.

  But if I think about it, even that seems unnecessary. 2
and 3 should be enough, as Sean said.

To reiterate, this is why I need to know what problem you are trying to address,rather than going at it from the solution point of view.


For 2, see below (same reason why order matters).

   maybe with
the added clarification that volatile operations cannot be reordered
with respect to each other as David pointed out is the LLVM (and
therefore GCC, as LLVM is GCC-compatible) behavior.


The only reason you'd need reordering prevention is if you had shared
variables.

No. It's very common to use memory-mapped I/O (be it in kernel space
or via files in user space) to create stateful communication.
Reordered or folded operations would completely mess up the protocol.


Communication between what?

    But most relevant C compilers have a fairly sane definition
of this. For example, GCC:
http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html

not the atomicity that people seem to associate with it.


Exactly what semantics are you looking for?

GCC's volatile semantics, pretty much. I want to be able to interact
with volatile memory without the compiler thinking it can optimize or
reorder (or whatever) my memory accesses. In other words, tell the
compiler to back off and leave volatile code alone.


Unfortunately, this is rather vague. For example, how many read/write
operations are there in v++? Optimizing is a terminally nebulous concept.

How many reads/writes there are is actually irrelevant from my
perspective. The semantics that I'm after will simply guarantee that,
no matter how many, it'll stay at that number and in the defined order
of the v++ operation in the language.


At that number? At what number? And why do you need a defined order, unless
you're doing shared memory?

Of course memory-mapped I/O can be called "shared" memory, but it's
not shared in the traditional sense of concurrency, since memory
barriers wouldn't matter; this is all about constraining the compiler.
While memory barriers can do that, it would be inefficient.

Let's look at it this way. Suppose I have this code:

     class C { int i; }
     C c = ...;

     foo();
     c.i++;
     bar();
     c.i++;
     baz(c);

A clever compiler could trivially spot that c isn't being shared
between threads, assigned to a global, nor passed to any function. So
it's not an unreasonable optimization to rewrite this to:

     C c = ...;

     foo();
     bar();
     c.i += 2;
     baz(c);

However, this would be invalid if some part of c was mapped to some
device or file. Now, when I tack volatile on it like this:

     C c = ...;

     foo();
     volatile { c.i++; }
     bar();
     volatile { c.i++; }
     baz(c);

I'm telling the compiler that these two increments matter. Rewriting
them to a single addition of 2 is not okay. Rewriting them to two
additions of 1 (which is how most compiler IRs represent it anyway) is
perfectly fine if the compiler so desires. Further, by tacking
volatile on here, I'm telling the compiler that the order matters as
well, so the volatile statements may not be reordered with respect to
*each other* (but may be reordered with respect to other statements).

I suppose you have a point about numbers of reads and writes (which
emphasizes order being very important). So, to be precise, in an
operation like c.i++, there should be exactly one read and one write
from/to the memory location c.i. Whether it's done in a single
instruction, or whatever, is irrelevant, as long as the desired effect
on memory is achieved.

This is quite incorrect. i++ can be one read and one write, or two reads and onewrite. There's nothing about volatile or the C standard that says anything aboutread/write cycles. The C compiler you're using may happen to do what you want,but you wouldn't be relying on any sort of guarantee, portable or not.

  It's worth noting that excessive reads from
volatile memory *are* acceptable, however, since they do not alter any
state. Only excessive writes can be problematic.

The standard doesn't say anything about how many write cycles an operation mayor may not do.


D volatile isn't implemented, either.

It is in LDC and GDC.

It doesn't insert a compiler reordering fence? Martin Nowak seemed to
think that it does, and a lot of old druntime code assumed that it
did...


dmd, all on its own, does not reorder loads and stores across accesses to
globals or pointer dereferences. This behavior is inherited from dmc, and
was very successful. dmc generated code simply did not suffer from all
kinds
of heisenbugs common with other compilers because of that. I've always
considered reordering stores to globals as a bad thing to rely on, and
not a
significant source of performance improvements, so deliberately disabled
them.

However, I do understand that the D spec does allow a compiler to do
this.

Right. What you just described is an undocumented implementation
detail of one particular D compiler that I simply cannot rely on.

Even though shared is not implemented at the low level, I suggest using
it
anyway as it currently does work (with or without shared). You should
anyway, as the only way there'd be trouble is for multithreaded access to
that memory anyway.

... with DMD.

And even if we ignore the fact that this will only work with DMD,
shared will eventually imply either memory fences or atomic
operations, which means unnecessary pipeline slowdown. In a kernel.
Not acceptable.

As for exact control over read and write cycles, the only reliable way to
do
that is with inline assembler.

Yes, that would perhaps work if I targeted only x86. But once a kernel
expands beyond one architecture, you want to keep the assembly down to
an absolute minimum because it makes maintenance and porting a
nightmare. I specifically want to target ARM once I'm done with the
x86 parts.


It's not a nightmare to write an asm function that takes a pointer as an
argument and returns what it points to. You're porting a 2 line function
between systems.

Not between systems. Between systems and compilers. It quickly turns
into quite a few functions, especially if you're going to handle
different sizes (1, 2, 4, 8 bytes, etc),


Just one if you use a template.

  heck, you're going to have to
handle anything that a pointer can point to. You can of course define
some primitives to do this, but that not only results in inefficient
code generation, it also leads to overly verbose and unmaintainable
code because you have to read out each member of a structure manually.


I think this is an exaggeration.


Can we please not use the inline assembler as an excuse to not
implement a feature that is rather essential for a systems language,
and especially in the year 2012? I understand that there are
difficulties in working out the exact semantics, but I'm sure we can
get there, and I think we should, instead of just hack-fixing a
problem like this with inline assembly as some sort of "avoid the
optimizing compiler completely" solution, which results in
unreasonable amounts of code to maintain and port across
configurations.


I don't think it is an unreasonable amount of code at all.

However, I can see it as a compiler builtin function, like bsr() and inp() are.Those are fairly straightforward, and are certainly a lot easier to understandthan volatile semantics, which cause nothing but confusion. They are also asefficient as can be once implemented by the compiler (and you can use them withyour own implementation in the meanwhile).

I don't see why implementing volatile in D with the semantics we've
discussed here would be so problematic. Especially considering GCC and
LLVM already do the right thing, and it sounds like DMD's back end
will too (?).


I'd rather work on "what problem are you trying to solve" rather than
starting with a solution and then trying to infer the problem.

It's always been about safe memory-mapped files and I/O in the face of
optimizing compilers.

Well, you didn't say that until now :-). But now that I know what you're tryingto do, I think that a couple compiler intrinsics can do the job.

_______________________________________________
dmd-internals mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/dmd-internals

Re: [dmd-internals] Regarding deprecation of volatile statements

Reply via email to