Re: invokespecial-super-init

John Rose Thu, 17 Sep 2015 10:58:53 -0700

On Aug 29, 2015, at 4:42 AM, Jochen Theodorou <blackd...@gmx.org> wrote:
> 
> hi John,
> 
> thanks for replying...
> 
> After having read that, I think part of the problem actually comes from this 
> new-invokespecial-super being split in two bytecodes. It means there can be a 
> lot of things in between, including different paths. This makes the Verifier 
> difficult.
> 
> The other part is that I need to react to runtime types. Currently this is 
> only possible by using a generic handle, that will install the real target 
> later on... With the problem, that the first call of the target is done from 
> inside the generic handle, instead of the callsite. In terms of object 
> creation, this means I will have access to the object, and in case of 
> super-init-calls it would mean me having access to a not fully initialized 
> class and potentially doing bad things here.


Yes.  The normal rule for instances of C (with super S) is they are always 
created by a call to C.<init> and then S.<init>.  We could relax this by 
providing the call to S.<init>, and trust that the user will do something 
equivalent to the other parts of some appropriate C.<init>.

Such trusting is conceivable, if the MH to S.<init> is *only* accessible to the 
private (full-power) lookup of C.  But it is still dangerous, since it allows a 
programmer to code C.<init> in a way that maintains delicate invariants, and 
then undermine those invariants accidentally by using the bare S.<init>

This leads to a natural pattern:  If the user wants to generate dynamically 
selected calls to two or more overloadings of S.<init>, wrap 
all of the S.<init> in corresponding C.<init> overloadings.  Add extra 
arguments to the C.<init> to initialize each of the finals of C.  When groovyc 
generates a class C, have it generate those wrapper constructors.  Then there 
is no need to create them in MethodHandles.

Equivalently (as I implied before) we could have Lookup.findSuperConstructor 
return a MH that not only calls S.<init> but also (from additional arguments) 
initializes the finals of C.  (It would only work for full-power lookups.)

But it seems equally reasonable to put in the wrapper constructors C.<init> and 
just use Lookup.findConstructor in the usual way.

> And that is even though I don't even need a handle that returns something. 
> But since there is no real connection between slot 0 of the constructor I am 
> in and the generic handle
> 
> But I wonder if there is really no way around that. Let me construct 
> something crazy here... What would be if we had a dummy object instead? Let 
> us call it GenericInstance for now. Generic Instance is internally connected 
> to the partially generated class, but has no fields or methods offering 
> access to it. The only way to create a GenericInstance would be by a factory 
> method, from the indy API, like findSpecialConstructor or such. I would 
> define the signature that it returns the GenericInstance. The handle itself 
> is supposed to realize a new-"transform arguments"-invokeSpecial kind of 
> sequence.

Yes, the GenericInstance, standing for an uninitialized C, would model the part 
of the lifetime of C which is inside C.<init>.  (This is a little like Peter 
Levarts's concept of a verifiable "story".)  Call it UI<C>, with U = 
Uninitialized. The operations on the UI<C> would be similar to those on C, but 
they would try to avoid accidental publication of the C reference, until it was 
fully constructed (whatever that means).  This type-state is like the 
larval/adult distinction I blogged once.

But, if you are willing to use wrapper constructors C.<init>, you don't need 
the extra types and states.  Is that enough for Groovy?

> The Verifier thus needs to acknowledge it to do that. And there needs to be 
> code, that takes the result of the GenericInstance and then places the real 
> instance in variable slot 0.

This is the problem with both your and Peter's proposals:  It requires verifier 
changes.  Those scare me, because I've worked with the verifier long enough to 
know how verifier complexity translates directly into challenges to safety and 
sanity.

> Since it is a two fold mechanism I cannot programatically do anything with 
> the GenericInstance object, but to reach it through. Only the part unwrapping 
> it can access the real instance (and also check the class to be sure) and 
> that would be VM code.

If you sit down and write the rules for GI / UI<C>, if you want to accurately 
emulate everything that a C.<init> could do, I think you will find that there 
is nothing unique about UI, *except* the ability to initialize finals.

> I think this way splitting the method or have a constructor equivalent is not 
> required... but I am not sure something like GenericInstance can be done. In 
> pure Java probably not

So, for pure Java, here is a POC for wrapper constructors:

class S { S(int x){…}, S(String y){…} …}  // fixed API, not generated by Groovy
interface WC { } // marker type for wrapper constructors
class C {  // generated by Groovy
  final char p, q;
  private static class Finals { final char p, q; }
  private C(WC ig, int x, MethodHandle finals) { super(x); Finals f = 
finals.invokeExact(this); this.p=finals.p; this.q=finals.q; }
  private C(WC ig, String y, MethodHandle finals) { super(y); Finals f = 
finals.invokeExact(this); this.p=finals.p; this.q=finals.q; }
  public C(int x, boolean z) { this(null, x, MH.bindTo(z)); }
  public C(String y, boolean z) { this(null, y, MH.bindTo(z)); }
  // public C(DynamicArgList dynamicArgs) { this(no can do!); super(this 
neither!); }
}

This pattern is approximately as general as random bytecodes inside 
constructors, is reasonably compact, and does not require new method handle 
types or verifier rules.

(Note that the MH "finals" is able to "see" the UI<C> under the type C.  It is 
supposed to treat it reasonably, just like constructor code is supposed to.  
Since the wrapper constructors are marked private, it is impossible for 
untrusted parties to inject malicious MH code.  The MH could be replaced by a 
private instance method, if there is no need to have a different MH at 
different construction sites.)

What do you think?  Is this close to the workarounds you already use?

— John

> bye jochen
> 
> 
> Am 29.08.2015 03:40, schrieb John Rose:
>> The invokespecial-super-init dance is the thing MH's can't quite do, the 
>> "super" call every constructor (except Object.<init>).
>> 
>> It very hard to secure this pattern; just ask anybody who has worked on 
>> (de-)serialization security.
>> 
>> But, we can look at it from a more limited point of view which might improve 
>> your use case, Jochen.
>> 
>> A method handle is supposed to be a fully competent replacement for 
>> hardwired bytecodes, and it is, except for invokespecial-super from a 
>> constructor.  The reason this is hard is that there is no way to constrain 
>> such a method handle, once constructed, to operate inside a constructor.  
>> And superclasses have a right to expect that you are running their 
>> constructor as a unique, non-repeatable part of creating a subclass object.  
>> (By "have a right" I really mean "it would be wrong to do the unexpected" by 
>> which I also mean "attack surfaces are likely to open up if we do this.)
>> 
>> So, is there a way to package up a method handle so that it can only be used 
>> as as unique, non-repeatable part of creating a subclass object?  Yes, it 
>> can:  Wire in an unconditional "new instance" operation, and immediately run 
>> the "invokespecial super" on the new thing.
>> 
>> Now the problem reduces to:  Your class (just like its super) has a right to 
>> expect that constructor code will be run on every newly-created instance 
>> (after the super constructor), before the new object is made available to 
>> other code.  Can we package up the previous new-invokespecial-super method 
>> handle so it can only be used in this way?  Well, no, since every 
>> constructor *also* has a hardwired call to invokespecial; we are back to the 
>> pre-existing new-invokespecial type of MH.
>> 
>> There are several possible ways out, but the problem is delicate.  The 
>> purpose of constructors is to statically mark code that must be executed 
>> before any (normally published) reference to an object is reachable by 
>> non-class code.  If there were a way to statically mark code as 
>> "post-super-init" ("<postsuperinit>"?), we could make an agreement with a 
>> class that such a method would serve as the equivalent of a constructor, but 
>> it would be the caller's responsibility to allocate the new instance *and* 
>> call the super init.  Allowing bytecode to call this stuff would require a 
>> bunch of new verifier rules, in a place where the verifier is already hard 
>> to understand.  Perhaps a method handle could be allowed to operate where 
>> normal bytecode cannot, but you see the problem:  Method handles are 
>> designed to give a dynamic alternative to things you can already do in 
>> bytecode.
>> 
>> The "post-super-init" convention can be a private convention within a class, 
>> in the special case of Groovy, since Groovy is responsible for generating 
>> the whole class, and can trust itself to invoke all necessary initialization 
>> code on each new instance.  So if you had an new-invokespecial-super MH in a 
>> private context within a Groovy-generated class, you could use it to create 
>> a "mostly blank" instance, and then fill it in before sharing it with 
>> anybody else.  Such an invokespecial-super MH could be adequately protected 
>> from other users by requiring that "Lookup.findSpecialConstructor" can only 
>> work on full-powered lookups, regardless of the accessibility of the super 
>> constructor.
>> 
>> There are two further problems with this, though.  First, constructors have 
>> a unique ability and obligation to initialize blank final variables (the 
>> non-static ones).  So the Lookup.findSpecialConstructor MH has to take an 
>> argument, not just for its super-constructor, but also for *each* final 
>> variable in the *current* class.  (Note that Lookup.findSetter will *not* 
>> allow finals to be set, because it cannot prove that the caller is somehow 
>> "inside" a constructor, and, even if inside it, is trustably acting on 
>> behalf of it.)  There are other ways to go, but you can see this problem 
>> too:  The new-invokespecial operator has to take responsibility for working 
>> with the caller to fill in the blank finals.
>> 
>> The second further problem is even more delicate.  The JVM enforces rules of 
>> calling <init> even (sometimes) against the wishes of people who generate 
>> class files.  We don't fully understand the practical effects of relaxing 
>> these rules.  Proofs of assertions (such as type correctness and security) 
>> require strong premises, and the rigid rules about <init> help provide such 
>> premises.  An example of a proof-failure would be somebody looking at a 
>> class, ensuring that all instances are secure based on the execution of 
>> <init> methods, but then fail to notice that the class *also* runs some 
>> instances through an alternate path, using new-invokespecial-super, which 
>> invalidates the proof by failing to run some crucial setup code.
>> 
>> With all that said, there is still wiggle room.  For example, one *possible* 
>> solution that might help Groovy, while being restrictive enough to avoid the 
>> problems above, would be to split <init> methods and sew them together again 
>> with method handles.
>> 
>> Suppose there were a reliable way to "split" an <init> method into two 
>> parts:  Everything up to the invokespecial-super-<init> call, and everything 
>> afterwards.  (Perhaps it must be preceded *only* by load-from-local 
>> opcodes.)  Call such <init> methods "splittable".  Not all will be 
>> splittable.  Then we could consider allowing a class to replace one of its 
>> splittable constructors by a new hybrid consisting of a differently-selected 
>> super-constructor, followed by the tail of the splittable constructor.  
>> (Note that this neatly handles blank finals.)  It would not be valid for any 
>> party other than the sub-class itself to perform such a split, but it might, 
>> arguably, be reasonable for a class to do such a thing.
>> 
>> There are always many defects with such schemes.  In this case, there is no 
>> robust way to detect that a splittable constructor has in fact been split.  
>> (I keep wanting to invent new bytecodes or verifier rules here!)  Any rule 
>> for splittability is going to be a little hacky, hence hard to understand 
>> and use correctly.  Specific constructors might be "coupled" strongly to 
>> matching super-constructors, in such a way that a mix-and-match will cause 
>> surprises, even to the author of the subclass.  (Having stuff happen by 
>> invisible magic gets old, as soon as you realize you have to vouch for the 
>> behavior of code which you can really only see in source form.)  Finally (as 
>> noted above) MHs are quite robustly understanable from the principle that 
>> they are "just another way" to do what bytecodes have already done; 
>> violating this principle pushes uncertainties into equivalence proofs about 
>> MHs and bytecodes.
>> 
>> In the end, I think Groovy may be better off using its ugly <init> bytecode 
>> sequence, where every subclass constructor calls (via a switch) every 
>> superclass constructor.
>> 
>> I hope this helps, although it's kind of disappointing.  We ran into same 
>> dangerous dance, in the Valhalla bytecode interpreter, and had to fake it 
>> from random bits of the MH runtime.
>> 
>> — John
>> 
>> On Feb 26, 2015, at 2:27 AM, Jochen Theodorou <blackd...@gmx.org> wrote:
>>> 
>>> Am 26.02.2015 01:02, schrieb Charles Oliver Nutter:
>>>> After talking with folks at the Jfokus VM Summit, it seems like
>>>> there's a number of nice-to-have and a few need-to-have features we'd
>>>> like to see get into java.lang.invoke. Vladimir suggested I start a
>>>> thread on these features.
>>> 
>>> my biggest request: allow the call of a super constructor (like 
>>> super(foo,bar)) using MethodHandles an have it understood by the JVM like a 
>>> normal super constructor call... same for this(...)
>>> 
>>> Because what we currently do is annoying and a major pita, plus it bloats 
>>> the bytecode we have to produce. And let us better not talk about speed or 
>>> the that small verifier change that made our hack unusable in several java 
>>> update versions for 7 and 8.
>>> 
>>> This has been denied in the past because of security reasons... And given 
>>> that we need dynamic argument types to determine the constructor to be 
>>> called, and since that we have to do a call from the runtime in the 
>>> uncached case, I fully understand why this is not done... just... it would 
>>> be nice to have a solution that does not require us doing basically a big 
>>> switch table with several invokespecial calls
>>> 
>>> bye Jochen
>>> 
>>> --
>>> Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
>>> blog: http://blackdragsview.blogspot.com/
>>> german groovy discussion newsgroup: de.comp.lang.misc
>>> For Groovy programming sources visit http://groovy-lang.org
>>> 
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev@openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>> 
> 
> 
> -- 
> Jochen "blackdrag" Theodorou
> blog: http://blackdragsview.blogspot.com/
> 

_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Re: invokespecial-super-init

Reply via email to