On Wed, Aug 1, 2012 at 7:25 PM, Walter Bright <[email protected]> wrote: > > On 7/31/2012 6:59 PM, Alex Rønne Petersen wrote: >> >> On Wed, Aug 1, 2012 at 2:55 AM, Walter Bright <[email protected]> >> wrote: >>> >>> On 7/31/2012 10:02 AM, Alex Rønne Petersen wrote: >>>> >>>> On Wed, Jul 25, 2012 at 1:20 AM, Walter Bright <[email protected]> >>>> wrote: >>>>> >>>>> On 7/24/2012 3:18 PM, Alex Rønne Petersen wrote: >>>>>> >>>>>> On Wed, Jul 25, 2012 at 12:11 AM, Walter Bright >>>>>> <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> On 7/24/2012 2:53 PM, Alex Rønne Petersen wrote: >>>>>>>> >>>>>>>> But shared can't replace volatile in kernel space. shared means >>>>>>>> atomics/memory fences which is not what I want - that would just >>>>>>>> give >>>>>>>> me >>>>>>>> unnecessary overhead. I want the proper, standard C semantics of >>>>>>>> volatile, >>>>>>> >>>>>>> >>>>>>> C does not have Standard semantics for volatile. It's a giant mess. >>>>>> >>>>>> Right, it leaves the exact definition of a volatile access to the >>>>>> compiler. >>>>> >>>>> >>>>> Right, that's why it is incorrect to refer to it as "standard" >>>>> behavior. >>>>> Behaviors I've seen include various combinations of: >>>>> >>>>> 1. disallowing enregistering >>>>> 2. preventing folding multiple loads/stores together >>>>> 3. preventing reordering across expressions with volatiles >>>>> 4. inserting memory load/store fences >>>> >>>> As Martin already said, 1 and 2 are exactly what I need, >>> >>> >>> Why do you need something not to be enregistered? It's usually loaded >>> into a >>> register before use, anyway. Also, why would you need 2? >> >> I think there may be a misunderstanding. By enregistering I thought >> you meant moving something off the stack and into registers >> completely. > > > That's what it means. But also, I have no idea what problem is addressed by > not disallowing register allocation. > > > >> But if I think about it, even that seems unnecessary. 2 >> and 3 should be enough, as Sean said. > > > To reiterate, this is why I need to know what problem you are trying to > address, rather than going at it from the solution point of view. > > > >> >> For 2, see below (same reason why order matters). >> >>> >>> >>>> maybe with >>>> the added clarification that volatile operations cannot be reordered >>>> with respect to each other as David pointed out is the LLVM (and >>>> therefore GCC, as LLVM is GCC-compatible) behavior. >>> >>> >>> The only reason you'd need reordering prevention is if you had shared >>> variables. >> >> No. It's very common to use memory-mapped I/O (be it in kernel space >> or via files in user space) to create stateful communication. >> Reordered or folded operations would completely mess up the protocol. > > > Communication between what?
Typically processes doing different things. One process might be receiving data from the network, while another processes it. It depends entirely on application design. > > > >> >>> >>>>> >>>>> >>>>>> But most relevant C compilers have a fairly sane definition >>>>>> of this. For example, GCC: >>>>>> http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html >>>>>> >>>>>>>> not the atomicity that people seem to associate with it. >>>>>>> >>>>>>> >>>>>>> Exactly what semantics are you looking for? >>>>>> >>>>>> GCC's volatile semantics, pretty much. I want to be able to interact >>>>>> with volatile memory without the compiler thinking it can optimize or >>>>>> reorder (or whatever) my memory accesses. In other words, tell the >>>>>> compiler to back off and leave volatile code alone. >>>>> >>>>> >>>>> Unfortunately, this is rather vague. For example, how many read/write >>>>> operations are there in v++? Optimizing is a terminally nebulous >>>>> concept. >>>> >>>> How many reads/writes there are is actually irrelevant from my >>>> perspective. The semantics that I'm after will simply guarantee that, >>>> no matter how many, it'll stay at that number and in the defined order >>>> of the v++ operation in the language. >>> >>> >>> At that number? At what number? And why do you need a defined order, >>> unless >>> you're doing shared memory? >> >> Of course memory-mapped I/O can be called "shared" memory, but it's >> not shared in the traditional sense of concurrency, since memory >> barriers wouldn't matter; this is all about constraining the compiler. >> While memory barriers can do that, it would be inefficient. >> >> Let's look at it this way. Suppose I have this code: >> >> class C { int i; } >> C c = ...; >> >> foo(); >> c.i++; >> bar(); >> c.i++; >> baz(c); >> >> A clever compiler could trivially spot that c isn't being shared >> between threads, assigned to a global, nor passed to any function. So >> it's not an unreasonable optimization to rewrite this to: >> >> C c = ...; >> >> foo(); >> bar(); >> c.i += 2; >> baz(c); >> >> However, this would be invalid if some part of c was mapped to some >> device or file. Now, when I tack volatile on it like this: >> >> C c = ...; >> >> foo(); >> volatile { c.i++; } >> bar(); >> volatile { c.i++; } >> baz(c); >> >> I'm telling the compiler that these two increments matter. Rewriting >> them to a single addition of 2 is not okay. Rewriting them to two >> additions of 1 (which is how most compiler IRs represent it anyway) is >> perfectly fine if the compiler so desires. Further, by tacking >> volatile on here, I'm telling the compiler that the order matters as >> well, so the volatile statements may not be reordered with respect to >> *each other* (but may be reordered with respect to other statements). >> >> I suppose you have a point about numbers of reads and writes (which >> emphasizes order being very important). So, to be precise, in an >> operation like c.i++, there should be exactly one read and one write >> from/to the memory location c.i. Whether it's done in a single >> instruction, or whatever, is irrelevant, as long as the desired effect >> on memory is achieved. > > > This is quite incorrect. i++ can be one read and one write, or two reads and > one write. There's nothing about volatile or the C standard that says > anything about read/write cycles. The C compiler you're using may happen to > do what you want, but you wouldn't be relying on any sort of guarantee, > portable or not. That wasn't meant to be in the context of C, but just memory-mapped I/O in general. > > > >> It's worth noting that excessive reads from >> volatile memory *are* acceptable, however, since they do not alter any >> state. Only excessive writes can be problematic. > > > The standard doesn't say anything about how many write cycles an operation > may or may not do. Same here. > > > >> >>> >>>>> >>>>> D volatile isn't implemented, either. >>>> >>>> It is in LDC and GDC. >>>> >>>>>> It doesn't insert a compiler reordering fence? Martin Nowak seemed to >>>>>> think that it does, and a lot of old druntime code assumed that it >>>>>> did... >>>>> >>>>> >>>>> dmd, all on its own, does not reorder loads and stores across accesses >>>>> to >>>>> globals or pointer dereferences. This behavior is inherited from dmc, >>>>> and >>>>> was very successful. dmc generated code simply did not suffer from all >>>>> kinds >>>>> of heisenbugs common with other compilers because of that. I've always >>>>> considered reordering stores to globals as a bad thing to rely on, and >>>>> not a >>>>> significant source of performance improvements, so deliberately >>>>> disabled >>>>> them. >>>>> >>>>> However, I do understand that the D spec does allow a compiler to do >>>>> this. >>>> >>>> Right. What you just described is an undocumented implementation >>>> detail of one particular D compiler that I simply cannot rely on. >>>> >>>>> Even though shared is not implemented at the low level, I suggest using >>>>> it >>>>> anyway as it currently does work (with or without shared). You should >>>>> anyway, as the only way there'd be trouble is for multithreaded access >>>>> to >>>>> that memory anyway. >>>> >>>> ... with DMD. >>>> >>>> And even if we ignore the fact that this will only work with DMD, >>>> shared will eventually imply either memory fences or atomic >>>> operations, which means unnecessary pipeline slowdown. In a kernel. >>>> Not acceptable. >>> >>> >>> >>> >>>>> As for exact control over read and write cycles, the only reliable way >>>>> to >>>>> do >>>>> that is with inline assembler. >>>> >>>> Yes, that would perhaps work if I targeted only x86. But once a kernel >>>> expands beyond one architecture, you want to keep the assembly down to >>>> an absolute minimum because it makes maintenance and porting a >>>> nightmare. I specifically want to target ARM once I'm done with the >>>> x86 parts. >>> >>> >>> It's not a nightmare to write an asm function that takes a pointer as an >>> argument and returns what it points to. You're porting a 2 line function >>> between systems. >> >> Not between systems. Between systems and compilers. It quickly turns >> into quite a few functions, especially if you're going to handle >> different sizes (1, 2, 4, 8 bytes, etc), > > > Just one if you use a template. That's a good point, but that template still has to handle all kinds of weird struct layouts correctly, including those with custom alignment specifiers. > > >> heck, you're going to have to >> handle anything that a pointer can point to. You can of course define >> some primitives to do this, but that not only results in inefficient >> code generation, it also leads to overly verbose and unmaintainable >> code because you have to read out each member of a structure manually. > > > I think this is an exaggeration. I don't follow. How else would you do it? > > >> >> Can we please not use the inline assembler as an excuse to not >> implement a feature that is rather essential for a systems language, >> and especially in the year 2012? I understand that there are >> difficulties in working out the exact semantics, but I'm sure we can >> get there, and I think we should, instead of just hack-fixing a >> problem like this with inline assembly as some sort of "avoid the >> optimizing compiler completely" solution, which results in >> unreasonable amounts of code to maintain and port across >> configurations. > > > I don't think it is an unreasonable amount of code at all. It really depends on how much of such code you're gonna have to write. Now, I certainly can't say from experience since I'm far from having an even remotely usable kernel, but I expect most device drivers to work through memory-mapped I/O, and hardware vendors aren't known for following standards of any kind... > > However, I can see it as a compiler builtin function, like bsr() and inp() > are. Those are fairly straightforward, and are certainly a lot easier to > understand than volatile semantics, which cause nothing but confusion. They > are also as efficient as can be once implemented by the compiler (and you > can use them with your own implementation in the meanwhile). > > > >> >>> >>> >>>> I don't see why implementing volatile in D with the semantics we've >>>> discussed here would be so problematic. Especially considering GCC and >>>> LLVM already do the right thing, and it sounds like DMD's back end >>>> will too (?). >>> >>> >>> I'd rather work on "what problem are you trying to solve" rather than >>> starting with a solution and then trying to infer the problem. >> >> It's always been about safe memory-mapped files and I/O in the face of >> optimizing compilers. >> > > Well, you didn't say that until now :-). But now that I know what you're > trying to do, I think that a couple compiler intrinsics can do the job. Sorry, I guess this thread has been more than a little unclear/confusing. I'm not opposed to intrinsics if they can be reasonably implemented by GDC and LDC as well. Also, they should be templated so they work with arbitrary pointer types. I would recommend names like "volatileLoad" and "volatileStore" since that would be immediately familiar to C programmers and it would do what they expect their C compiler to do (even if not standardized in C land). Regards, Alex _______________________________________________ dmd-internals mailing list [email protected] http://lists.puremagic.com/mailman/listinfo/dmd-internals
