On 16/09/2014, at 3:24 PM, srean wrote:

> 
> I think there are enough extra copies and other overheads that can be removed 
> to beat Python at it. Note the C++ code itself is twice or more faster than 
> the Python code, so passing the buck to C++ wont help.

I haven't seen the C++ version of it.

> One neednt allocate that list upfront although that turned out to be faster.
> There's got to be a way to make Felix yield competitive.

Of course there is, but it doesn't involve copying strings about.
Since Felix is a pass by value language, that seems inevitable,
even with some inlining, if one is using higher level functional
operations like split.

RE2's StringPiece would help if the base char array can be made
to persist. Its basically a struct { length, char const* } thing.

But that raises another deficiency in Felix. When you have a Felix native
structure containing pointers, you can nest it in another Felix native
structure. The compiler traverses the tree when building the array
of pointer offsets for the top level type.

There's no way to do this for any C data type *except* a type that
is already a pointer, you can label that like:

        _gc_pointer type fred = "fred*";

Now, there is a way to model a *complete* C data type without
knowing the offsets:

        type fat = "fat" scanner "fred_scanner";

The scanner is a C function that finds all the offsets.
This is how Judy Arrays are integrated into Felix,
with a custom scanner.

However, the scanner has to be applied to *pointer* onto
the heap. You cannot actually put one of these objects
into another Felix object because the compiler doesn't
know how to find the offsets. The run time system does,
via the scanner function, but that's no use.

The compiler generates a single offset array for all data types
unless there's a custom scanner. In fact, the compiler calls
a "standard custom scanner" and passes it the offset array.
I think the function is called "scan_by_offsets" :)

So now the point is for a StringPiece, if implemented
in C++ (as the RE2 one is), the fact I know where the
contained char* is doesn't help. I can make a custom scanner
for it, but then all the StringPiece have to be whole objects.
Technically: either "on the machine stack" or "whole objects
on the heap". Conservative scan takes care of the stack case.

There are some ways to fix this: one is to represent a data structure
type at run time not with a single flat RTTI object but recursively.
In other words, a "struct" with three fields would be represented
by an array of three pointers to the field types. At run time a recursive
descent can find all the offsets. This is obviously better because it
makes run time type construction a breeze. However the downside
is that the scanning for offsets would be slower.

The bottom line is that if I want string pieces .. I have to implement
them in Felix.

--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to