On Sep 10, 2009, at 9:54 AM, Sanne Korzec wrote:
> Basically, I am trying to transform the following python
> datastructure into
> cython.
>
> Source_index = int
> Target_index = int
> Phrase_count = float
> Phrase_prob = float
>
> Phrase_table = {}
> Subdict = {}
> Subdict[s_index] = [count, prob]
> Phrase_table[tindex] = subdict
>
> So that a call to:
>
> Phrase_table[tindex][sindex] gives the count and prob.
You could probably get a twofold speedup by just implementing this as
a hashtable (int, int) -> (float, float) rather than nested
hashtables (unless that doesn't work for your algorithm).
> I described this im my previous mail as, sorry for the confusion.
>
> Key : int ---> value : hashtable { key: int ---> value: list(float,
> float) }
>
> My main two concerns with using this hashtable are:
>
> -Can I reference to the "subdict" hashtable from the original void
> *HashTableValue? And how do I cast this?
> -Can I store two floats in void *HashTableValue;
This hashtable implementation is a map from pointers to pointers, and
the pointers can refer to anything you want. The drawback of course
is that you have to manually manage the memory those pointers point to.
> In addition,
>
> I have started to create what I want, but I am still having some
> difficulties. Attached is my .pyx file.
>
> Some questions I have are:
>
> -Void_star_to_hashtable is obviously wrong, but why exactly?
>
> cdef c_HashTable void_star_to_hashtable(void* v):
> cdef void** b = [v]
> cdef c_HashTable* a = <c_HashTable*>b
> return a[0]
>
> -line 86: sub_dict =
> <c_HashTable*>void_star_to_hashtable(hash_table_lookup(self._base,
> int_to_void_star(tindex)))
>
> Why do I need a cast here?
The code, as written, is returning a c_HashTable, not a c_HashTable*.
You should just cast between c_HashTable* and void*--the only reason
I had a conversion function is because I was storing the float as if
it were a pointer to avoid manually managing the memory (which you
will have to do as a c_HashTable struct is larger than a pointer).
I think it's pertinent to point out that this library was designed to
be used from C, which means it's totally usable from Cython, but the
interface is very C-like so the only way to use it is like you would
in C. If you're not confortable with C and pointers and malloc, etc.
then I would do the following, which will still be a lot faster than
what you have:
Implement a cdef class that wraps a pairs of ints, and another that
wraps pairs of floats. Create methods to instantiate them very
quickly (avoiding all Python calls and argument passing) and give
them fast __hash__ and __cmp__ methods. Now use the first as keys and
the latter as values in a Python dictionary, and access their members
directly. This should be much faster than what you have, and probably
within a factor of 2 of using c_HashTable, as well as being much
easier to code (let alone avoiding the pitfalls of segfaults and
memory leaks).
- Robert
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev