On Thu, Jan 05, 2012 at 11:33:12AM -0800, Logan Bell wrote:
> In regard to the allocation function and the need to create an empty object
> has had me digging a bit more in the pickaxe book. The allocator is only
> needed "if the object you’re implementing doesn’t use any data other than
> Ruby instance variables, then you don’t need to write an allocation
> function—Ruby’s default allocator will work just fine. " If I understand
> that correctly, since our (Clownfish::CFC::Hierarchy) object does need data
> then we need to allocate the space up front in the allocator function.
>
> Further it goes on to outline reasons why this is necessary ( marshaling as
> you pointed out being one of them ):
>
> "One of the reasons for this multistep object creation protocol is that it
> lets the interpreter handle situations where objects have to be created by
> “back-door means.” One example is when objects are being deserialized from
> their marshaled form. Here, the interpreter needs to create an empty object
> (by calling the allocator), but it cannot call the initializer (because it
> has no knowledge of the parameters to use). Another common situation is
> when objects are duplicated or cloned."
>
> It might be worth doing some code diving on the ruby end to see for sure,
> but I can see value in in having constructors that accept no arguments.
Clownfish actually provides a direct analogue to Ruby's Class#allocate:
VTable_Make_Obj().
/** Create an empty object of the type defined by the VTable: allocate,
* assign its vtable and give it an initial refcount of 1. The caller is
* responsible for initialization.
*/
Obj*
Make_Obj(VTable *self);
For an example of how VTable_Make_Obj() is used during deserialization, here's
Freezer_thaw() from core/Lucy/Util/Freezer.c:
Obj*
Freezer_thaw(InStream *instream) {
CharBuf *class_name
= CB_Deserialize((CharBuf*)VTable_Make_Obj(CHARBUF), instream);
VTable *vtable = VTable_singleton(class_name, NULL);
Obj *blank = VTable_Make_Obj(vtable);
DECREF(class_name);
return Obj_Deserialize(blank, instream);
}
Freezer_thaw() obtains the class name, uses it to look up the right VTable
singleton, then invokes VTable_Make_Obj() to create the blank object. The
newborn blank object doesn't start off with much, but at least it has a VTable
-- so we can invoke the Deserialize() object method on it and flesh it out.
We also use VTable_Make_Obj() for every Lucy object that we create from
Perl-space. Our Foo_new() C functions have a limitation: they do not take a
class name as an argument, so they cannot support dynamic subclassing. For
instance, here is Normalizer_new():
Normalizer*
Normalizer_new(const CharBuf *form, bool_t case_fold, bool_t strip_accents)
{
Normalizer *self = (Normalizer*)VTable_Make_Obj(NORMALIZER);
return Normalizer_init(self, form, case_fold, strip_accents);
}
Because the VTable is hard-coded to NORMALIZER, objects created via
Normalizer_new() will *always* have a class of "Lucy::Analysis::Normalizer".
But what if you create a Perl subclass of Lucy::Analysis::Normalizer called
"MyNormalizer"?
package MyNormalizer;
use base qw( Lucy::Analysis::Normalizer );
my $normalizer = MyNormalizer->new;
Here's how Normalizer_new() would need to change in order to support such
subclassing:
Normalizer*
Normalizer_new(CharBuf *class_name, const CharBuf *form,
bool_t case_fold, bool_t strip_accents) {
VTable *vtable = VTable_singleton(class_name, NULL);
Normalizer *self = (Normalizer*)VTable_Make_Obj(vtable);
return Normalizer_init(self, form, case_fold, strip_accents);
}
The actual code which *does* support subclassing for Normalizer is spread
across three functions, two of which I've included below my sig for reference:
* XSBind_new_blank_obj() from perl/xs/XSBind.c, which wraps
VTable_Make_Obj().
* XS_Lucy_Analysis_Normalizer_new() from Lucy.xs, which is auto-generated.
* Normalizer_init(), from core/Lucy/Analysis/Normalizer.c.
In order to support dynamic subclassing in the Ruby bindings for Lucy, we will
need to provide similar functionality.
However, I question whether we need to provide that kind of functionality for
Clownfish::CFC, which is itself written using a much cruder object model:
* No support for subclassing.
* No support for serialization.
* No support for Ruby's #clone or #dup methods.
I don't yet understand why Ruby *needs* an allocator function if we aren't
going to use those bells and whistles. How many C libraries out there provide
two-stage constructors? It doesn't make sense that Ruby would impose such an
esoteric requirement, limiting the kinds of C libraries you could write Ruby
bindings for.
Something like this ought to work:
// Clownfish::CFC::Hierarchy#new
static VALUE
S_CFCHierarchy_new(VALUE klass, VALUE source_rb, VALUE dest_rb) {
const char *source = StringValuePtr(source_rb);
const char *dest = StringValuePtr(dest_rb);
CFCHierarchy *self = CFCHierarchy_new(source, dest);
return Data_Wrap_Struct(klass, NULL, NULL, self);
}
// Bootstrap Clownfish::CFC::Hierarchy.
static void
S_Init_CFCHierarchy() {
cHierarchy = rb_define_class_under(mCFC, "Hierarchy", rb_cObject);
rb_define_method(cHierarchy, "build", S_CFCHierarchy_build, 0);
rb_define_singleton_method(cHierarchy, "new", S_CFCHierarchy_new, 2);
}
// Bootstrap Clownfish::CFC and all of its components.
void
Init_CFC() {
mClownfish = rb_define_module("Clownfish");
mCFC = rb_define_module_under(mClownfish, "CFC");
S_Init_CFCHierarchy();
}
I don't know whether that's an idiomatic approach for writing a Ruby extension,
but if it works, it prevents us from having to add a bunch of CFCFoo_allocate()
functions and from having to provide two-stage constructors for every
Clownfish::CFC component.
In any case, exploring this topic for the CFC bindings helps us to understand
the issues we will confront when auto-generating Ruby wrapper code via the
as-yet-to-be-written Clownfish::CFC::Binding::Ruby. :)
Marvin Humphrey
cfish_Obj*
XSBind_new_blank_obj(SV *either_sv) {
cfish_VTable *vtable;
// Get a VTable.
if (sv_isobject(either_sv)
&& sv_derived_from(either_sv, "Lucy::Object::Obj")
) {
// Use the supplied object's VTable.
IV iv_ptr = SvIV(SvRV(either_sv));
cfish_Obj *self = INT2PTR(cfish_Obj*, iv_ptr);
vtable = self->vtable;
}
else {
// Use the supplied class name string to find a VTable.
STRLEN len;
char *ptr = SvPVutf8(either_sv, len);
cfish_ZombieCharBuf *klass = CFISH_ZCB_WRAP_STR(ptr, len);
vtable = cfish_VTable_singleton((cfish_CharBuf*)klass, NULL);
}
// Use the VTable to allocate a new blank object of the right size.
return Cfish_VTable_Make_Obj(vtable);
}
XS(XS_Lucy_Analysis_Normalizer_new) {
dXSARGS;
CHY_UNUSED_VAR(cv);
if (items < 1) { CFISH_THROW(CFISH_ERR, "Usage: %s(class_name, ...)",
GvNAME(CvGV(cv))); }
SP -= items;
const lucy_CharBuf* normalization_form = NULL;
chy_bool_t case_fold = true;
chy_bool_t strip_accents = false;
chy_bool_t args_ok = XSBind_allot_params(
&(ST(0)), 1, items, "Lucy::Analysis::Normalizer::new_PARAMS",
ALLOT_OBJ(&normalization_form, "normalization_form", 18, false,
LUCY_CHARBUF, alloca(cfish_ZCB_size())),
ALLOT_BOOL(&case_fold, "case_fold", 9, false),
ALLOT_BOOL(&strip_accents, "strip_accents", 13, false),
NULL);
if (!args_ok) {
CFISH_RETHROW(CFISH_INCREF(cfish_Err_get_error()));
}
lucy_Normalizer* self = (lucy_Normalizer*)XSBind_new_blank_obj(ST(0));
lucy_Normalizer* retval = lucy_Normalizer_init(self, normalization_form,
case_fold, strip_accents);
if (retval) {
ST(0) = (SV*)Cfish_Obj_To_Host((cfish_Obj*)retval);
Cfish_Obj_Dec_RefCount((cfish_Obj*)retval);
}
else {
ST(0) = newSV(0);
}
sv_2mortal(ST(0));
XSRETURN(1);
}