On Tue, May 7, 2013 at 6:50 AM, Nick Wellnhofer <[email protected]> wrote:
> So in many cases a "zombie" string has to be copied anyway defeating the
> whole optimization (even costing a bit).
Any speedup is going to be dependent on a number of factors.
* Whether the subroutine being invoked requires that the string be copied.
* The cost ratio between host-subroutine-call overhead and duplicating the
content with a heap-allocated Clownfish String.
* Whether the host string encoding matches the Clownfish String native
encoding. (Perl scalars without the SVf_UTF8 flag set must be checked for
whether they are ASCII and possibly converted from Latin-1 to UTF-8).
> But, as you say, it would be nice if this could be measured.
This stack-allocated-string mechanism sure does suck up a lot of developer
oxygen. :P It's primarily justified as an optimization, and it's only affects
the speed of code which crosses the host/clownfish border. Host code is
always going to be slow; optimizing pure Clownfish-flavored C is more
important, and stack-allocated string wrappers have no effect on that.
I coded up a Perl benchmark script using Inline::C which approximates the
optimization under best-case conditions. (See below my sig.)
Here are results on one system:
Rate heap_latin1 heap_utf8 stack_latin1 stack_utf8
heap_latin1 4591346/s -- -6% -67% -69%
heap_utf8 4906438/s 7% -- -65% -66%
stack_latin1 13849117/s 202% 182% -- -5%
stack_utf8 14588883/s 218% 197% 5% --
That's not nothing, but it's not a 10x speedup either. Perl's subroutine call
overhead is pretty substantial. Other languages may be better, but I bet
Python and Ruby are comparably sluggish.
In the abstract, I like stack-allocated host string wrappers because the
approach is appropos to the Clownfish mission. But if they require sacrifices
to the API (CAPTURE is inferior to INCREF), a gnarly implementation,
and they don't speed things up all that much anyway, it may be time to give up
on them and focus on other things.
>> I think that exceptions may longjmp out of Clownfish code past the host
>> wrapper, though. Right now if that happens we can leak memory -- a newly
>> allocated Clownfish object needed for calling into Clownfish code may not
>> get decremented. If we delay copying we might start getting memory read
>> errors, though.
>
> Right. The lazy copying won't work with longjmps.
If we start translating to string SVs to Clownfish Strings the way that we
translate AVs to VArrays and HVs to Hashes, we'll need to mortalize them so
they don't leak on exceptions.
>From XSBind_maybe_sv_to_cfish_obj:
if (retval) {
// Mortalize the converted object -- which is somewhat
// dangerous, but is the only way to avoid requiring that the
// caller take responsibility for a refcount.
SV *mortal = (SV*)Cfish_Obj_To_Host(retval);
CFISH_DECREF(retval);
sv_2mortal(mortal);
}
That will make things slower still, but it's probably worth it for the sake of
correctness.
Marvin Humphrey
use strict;
use warnings;
use Benchmark qw( cmpthese );
use Inline C => <<'END_C';
typedef struct String {
union { int32_t count; void *host_obj; } ref;
void *vtable;
const char *content;
size_t size;
struct String *origin;
} String;
String*
String_init(String *self, const char *content, size_t size, int zombie) {
self->vtable = NULL;
self->ref.count = 1;
if (zombie) {
self->content = content;
self->size = size;
self->origin = NULL;
}
else {
self->content = (char*)malloc(size);
memcpy((char*)self->content, content, size);
self->origin = self;
}
return self;
}
void
String_destroy(String *self) {
if (self->origin == self) {
free((char*)self->content);
free(self);
}
else if (self->origin == NULL) {
croak("Can't destroy a zombie string!");
}
}
void
do_nothing_with_string(String *string) {
(void)string;
}
void
stack_wrap(SV *sv) {
STRLEN len;
char *content = SvPVutf8(sv, len);
void *allocation = alloca(sizeof(String));
String *string = String_init(allocation, content, len, TRUE);
do_nothing_with_string(string);
}
void
heap_dupe(SV *sv) {
STRLEN len;
char *content = SvPVutf8(sv, len);
void *allocation = malloc(sizeof(String));
String *string = String_init(allocation, content, len, FALSE);
do_nothing_with_string(string);
String_destroy(string);
}
END_C
cmpthese(-1, {
heap_latin1 => sub { heap_dupe("Mot\xF6rhead") },
heap_utf8 => sub { heap_dupe("Smile: \x{263a}") },
stack_latin1 => sub { stack_wrap("Mot\xF6rhead") },
stack_utf8 => sub { stack_wrap("Smile: \x{263a}") },
})