Re: Why do custom types need to be reference counted objects for dynamic dispatch to work.

Varriount Wed, 04 Jan 2017 06:15:03 +0100

Yes, objects need to be reference counted for methods to work. This is because 
only reference types can point to variable-length memory regions.


Take the below code: 
    
    
    type
      Animal = ref object of RootObj
        name: string
      
      Dog = ref object of Animal
        breed: string
    
    method makeNoise(this: Animal) =
      echo "Hi, I'm ", this.name
    
    method makeNoise(this: Dog) =
      echo "*Bark!* [said ", this.name, "]"
    

These type definitions translate roughly to the equivalent structures: 
    
    
    # TypeInfo is an object containing type information
    # makeTypeInfo creates a TypeInfo object holding a type's information
    
    type
      AnimalObjBase = object of RootObj
        typeInfo = ptr TypeInfo
      
      AnimalBase = ptr AnimalObjBase
      
      AnimalObj = object of RootObj
        typeInfo = ptr TypeInfo
        name: pointer
      
      Animal = ptr AnimalObj
      
      DogObj = object of RootObj
        typeInfo = ptr TypeInfo
        name: pointer
        breed: pointer
      
      Dog = ptr DogObj
    
    
    const
      animalTypeInfo: TypeInfo = makeTypeInfo(AnimalObjBase)
      dogTypeInfo: TypeInfo = makeTypeInfo(DogObjBase)
    
    
    proc makeNoise_Animal(this: Animal) =
      echo "Hi, I'm ", this.name
    
    proc makeNoise_Dog(this: Dog) =
      echo "*Bark!* [said ", this.name, "]"
    
    proc makeNoise(this: AnimalBase) =
      if baseObj.typeInfo == animalTypeInfo:
        makeNoise_Animal(cast[Animal](this))
      elif baseObj.typeInfo == dogTypeInfo:
        makeNoise_Dog(cast[Dog](this))
    

(Note that this isn't exactly valid code, nor is it precisely how methods are 
implemented)

Note that 'AnimalObjBase', 'AnimalObj', and 'DogObj' all share common fields, 
'typeInfo' for all three, and 'name' for the latter two. This means that, given 
a region of memory holding data from one of these three types, we will always 
be able to access the 'typeInfo' field, and given a region of memory holding 
data from AnimalObj or DogObj, we can access the 'name' field (this 
field-sharing is the basis for subtyping).
    
    
    +---------------+   +---------------+   +---------------+
    | AnimalObjBase |   | AnimalObj     |   | DogObj        |
    +---------------+   +---------------+   +---------------+
    | typeInfo      |   | typeInfo      |   | typeInfo      |
    +---------------+   +---------------+   +---------------+
                        | name          |   | name          |
                        +---------------+   +---------------+
                                            | breed         |
                                            +---------------+
    

The typeInfo field is used to mark these regions of memory. As long as every 
AnimalObj's 'typeInfo' member points to 'animalTypeInfo' and every DogObj's 
'typeInfo' member points to 'dogTypeInfo', we can reinterpret (cast) these 
regions of memory to their appropriate types, and pass them into their 
corresponding procedures/methods.

Now lets look at how objects are stored in memory. In contrast to references, 
which are pointers that always point to heap-allocated memory, object data may 
be located either in the heap _or_ the stack. It's this latter case that 
reveals why methods won't work on object types.

Say we create Animal and Dog variables in a main method, then pass those 
variables into a procedure which calls the 'makeNoise' method: 
    
    
    method makeNoise(this: AnimalBase)
    
    proc makeLotsOfNoise(someAnimal: Animal):
      makeNoise(someAnimal)
      makeNoise(someAnimal)
      makeNoise(someAnimal)
    
    proc main =
      var animal = Animal(name: "Unknown")
      var dog = Dog(name: "Spot", breed: "Poodle")
      
      makeLotsOfNoise(animal)
      makeLotsOfNoise(dog)
    
    main()
    

When 'main' is called, after the variables are created, the stack holds two 
references that point to regions of heap memory: 
    
    
    main():
      animal: 8 byte pointer -> 16 byte heap memory region
      dog:    8 byte pointer -> 24 byte heap memory region
    

And when makeLotsOfNoise is called, the stack layout looks something like this: 
    
    
    main():
      animal: 8 byte pointer -> 16 byte heap memory region
      dog:    8 byte pointer -> 24 byte heap memory region
      makeLotsOfNoise(someAnimal = animal):
        someAnimal: 8 byte pointer -> 16 byte heap memory region
        makeNoise(this = someAnimal):
          this: 8 byte pointer -> 16 byte heap memory region
          ...
      makeLotsOfNoise(someAnimal = dog):
        someAnimal: 8 byte pointer -> 24 byte heap memory region
        makeNoise(this = someAnimal):
          this: 8 byte pointer -> 24 byte heap memory region
          ...
    

Make note of the size of the parameter passed into 'makeLotsOfNoise' \- it's 
always an 8 byte pointer. This is a constraint of how procedure calls work, as 
the size of the parameters usually needs to be known ahead of time. 
Furthermore, the semantics of procedure calls must allow for the possibility 
(even if optimization decides otherwise) for parameter data to be copied from 
the previous procedure frame to the current procedure frame.

Now observe what happens if we were allowed to use objects instead. Our code 
becomes: 
    
    
    method makeNoise(this: AnimalObjBase)
    
    proc makeLotsOfNoise(someAnimal: AnimalObj):
      makeNoise(someAnimal)
      makeNoise(someAnimal)
      makeNoise(someAnimal)
    
    proc main =
      var animal = AnimalObj(name: "Unknown")
      var dog = DogObj(name: "Spot", breed: "Poodle")
      
      makeLotsOfNoise(animal)
      makeLotsOfNoise(dog)
    
    main()
    

And our stack looks like this: 
    
    
    main():
      animal: 16 byte stack memory region
      dog:    24 byte stack memory region
      makeLotsOfNoise(someAnimal = animal):
        someAnimal: 16 byte memory region
        makeNoise(this = someAnimal):
          this: 8 byte memory region
          ...
      makeLotsOfNoise(someAnimal = dog):
        someAnimal: 16 byte memory region
        makeNoise(this = someAnimal):
          this: 8 byte memory region
          ...
    

Notice that, because parameter data is copied from frame to frame, the region 
containing the 'Dog' data was truncated from 24 to 8 bytes! This would 
obviously lead to problems - what happens when makeNoise dispatches to the 
Animal and Dog methods, and the name/breed fields are accessed? We would get 
garbage, as the program tries to read from wrong areas of the stack.

While there are workarounds for this (the one that comes to my mind is passing 
a pointer to the stack data*, instead of copying it around), they all come with 
additional costs/caveats, or make parameter passing semantics even more complex 
than they already are.

Disclaimers:
    

  * *This is actually already done, except if certain pragmas are used (which 
the semantics still have to accommodate)
  * Yes, I know about alignments and have the stack would actually be laid out. 
The above stack diagrams are meant to illustrate the point, not the reality.
  * All the above implementation details are subject to change. For all I know 
type information could be passed as a hidden parameter in the future (or maybe 
it already is).

Re: Why do custom types need to be reference counted objects for dynamic dispatch to work.

Reply via email to