Re: data-crunching in guile

Andy Wingo Thu, 25 Jun 2009 14:08:50 -0700

Howdy,

I didn't mean to push those commits I just pushed -- my girlfriend was
looking for the web browser, but was on a full-screen Emacs window with
magit, and somehow clicking around pushed some commits! Perhaps this is
Marius' way of contributing to Guile these days ;-)

But I think they might be fine. I'll look and revert if necessary.

On Thu 25 Jun 2009 10:04, l...@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wi...@pobox.com> writes:
>
>> What I'm getting at is that I think we should have VM ops for working on
>> vectors -- both generic vectors, and specific ops for bytevectors
>
> Why not, but...
>
> It looks to me like a failure to optimize the general case, which we
> work around by specializing the instruction set.

Yeah, I wonder about that sometimes. In this particular case, I think
we're OK -- and the reason is, it's about data, not about procedures.

We have a number of primitive data structures in Guile, and we need to
provide fast access to those structures. We do favors to no one if we
force the machine to call out to functions to access our basic data
structures.

That is even more the case with bytevectors, because it's the data
structure of the hardware. Code is fast when it is inlined, when it is
linear and close to hardware operations -- and that is particularly the
case for bytevector ops.

But, regarding the overhead of a mv-call instead of a call -- I think
that overhead is mostly in space and not in time -- or, that is to say,
it is in time to the degree that its space pushes instructions out of
the data cache. (But see below, regarding "dark arts".)

So given that we really support a more functional style of
programming... 

> Maybe we could have a more generic approach to the problem.  For
> instance, there could be a way to annotate primitive procedures with
> information useful to the compiler, such as "returns-single-value",
> "return-value-unspecified", etc., which would allow the generated code
> to take shortcuts.  But maybe this is just unrealistic.  What do you
> think?

I don't know about this. It's probably not worth it. If you're calling
out to a procedure bound in `(guile)' anyway, that would do strange
things if the procedure is rebound. Depending on implementation, this
could increase the size of the subrs. It's certainly more complicated.

On the other hand, in the grander scheme, "the unspecified value" is as
stupid an idea as I've ever heard. I recognize its necessity given where
we're at now, but one would think that in principle such functions could
just return `void'...

I guess my point is: this is the right thing to do for vectors. They are
a fundamental data type, just like we have car and cdr instructions,
because such ops need to be inlined to go fast. Regarding the general
subr case... yes, I guess you're right regarding marking the number of
values a subr should return (0, 1, N, any), but as long as we're talking
about the Right Thing here, we probably are talking about a better FFI
than we have now.

That's my opinion as it stands now anyway, subject to modification of
course :)

>> I think we have the space in the VM, and if we group all
>> of the vector instructions at the end, we shouldn't affect the
>> instruction cache too much.

I don't have Neil's mail open here, but my thought was this: getting a
fast VM is a dark art of feeling and instinct, My feeling is that a VM
is fast if it fits in the CPU's cache: the instruction cache and the
data cache. The data cache means that smaller code is better, hence my
resistance to word-sized instructions. The instruction cache means that
the VM itself should be small -- but if the code for vector ops is all
"at the end" of the VM, then only code that uses vector ops pays for the
increased "cache footprint" of the VM.

But like I say, all this is instinct, which may well be wrong.

Regards,

Andy
-- 
http://wingolog.org/

Re: data-crunching in guile

Reply via email to