Hello:

Forgive the geeky regular expression :)  I have been playing around with
Inline to deal with a problem which is beyond the scope of Perl.  We have
files that are extremely large, so using Perl for parsing utilizing
Parse::Yapp or similar is just not going to cut it.

So, we have this nice parser written in C.  The main problem is finding a
method to coerce the data into something Perl can read.  XS was a little
daunting.  Inline was out there, but I hadn't tried it.

As a quick solution, I wrote something that barfed out a data structure ala
Data::Dumper.  Then the user "require" the result can walk along this data
structure and write programs using the data.  The problem is that this
doesn't scale worth a ____ (insert expletive).  Perl creeps along quite
slowly and it consumes a hell of a lot of memory.  The human readable ASCII
representation of the hash of what's actually in C-land can be up to 4 or 5
times the size of the original data.

Pissed and determined, I decided to revisit Inline and XS to solve this
problem once and for all.  I believe I have.  It's possible someone already
thought of this, but I figured I'd pose the question here since you folks
are seasoned Perl hackers.

The key of course is using Inline with the tie facility.  This allows you
to override the methods involved for a particular object.  Creating dual
copies of the data structures from C into Perl seemed like a silly thing to
do.  I would only shift my problem from one space to another.  And worse
yet I'd have two representations for the same data (bleah).  The snag as
many have pointed out is that tie doesn't have an easy way to handle nested
structures.  The result of a fetch would inevitably result in a reference,
and this wouldn't be tied.

So, my idea (as implemented) just ensures that any object that's returned
from a FETCH method is appopriately "tied" to a (probably another) class
which again has it's own methods.  Note the use of a hash Modules to ensure
we don't keep rerunning tie on the same object over and over again.  I
thought this might avoid excessive data creation.

Example:

This is a fetch method for my Virgelo::Parser.  I know that any object I
get back from any lookup results in a Module.  Therefore, before I return
it back, I make sure and tie it to the Module class.  Note this snippet may
not necessarily compile since it's a work in progress.  It's being refined
as I type and was edited since I found mistakes in my original code.

package Virgelo::Parser;

my %Modules = ();

sub FETCH
{
    my $self = shift;
    my $key = shift;
    my $obj;

    printf("Request fetch of key %s of hash %d\n", $key, $$self);

    if (exists ($Modules{$key})) {
        return $Modules{$key};
    }
    else {
        $obj = Virgelo::lookup($$self, $key);    # lookup is a C function call here

        if ($obj) {
            my %mod;

            tie %mod, "Virgelo::Module", $obj;   # $obj is actually a
                                                 # pointer from C-land

            $Modules{$key} = \%mod;

            return \%mod;
        }
    }
    return undef;
}

So, in this scenario, one could presumably do:

my $hr;

my $hr = Virgelo::Parser::Parse("/tmp/no_ports.v");

printf ("Ports are %d\n", ${$hr}{"foo"}{"ports"});

At each intermediate phase, you ensure that each object is "tied" to
something that handles all your C calls.  Which is exactly what I want.  I
get all my data in C-land w/o the nasty overhead of the Perl data as well.
All the objects look like 1st class citizens.

I'm interested in hearing what you guys think of this.  I find this a very
powerful way to allow problems that were formerly intractable in Perl to
still be (at least partially) handled in this space.

Thanks,

-Clint

Reply via email to