Basically, I am trying to transform the following python datastructure into
cython.

Source_index = int
Target_index = int
Phrase_count = float
Phrase_prob = float

Phrase_table = {}
Subdict = {}
Subdict[s_index] = [count, prob]
Phrase_table[tindex] = subdict

So that a call to:

Phrase_table[tindex][sindex] gives the count and prob.

I described this im my previous mail as, sorry for the confusion.

Key : int ---> value : hashtable { key: int ---> value: list(float, float) }

My main two concerns with using this hashtable are:

-Can I reference to the "subdict" hashtable from the original void
*HashTableValue? And how do I cast this?
-Can I store two floats in void *HashTableValue;



In addition,

I have started to create what I want, but I am still having some
difficulties. Attached is my .pyx file.

Some questions I have are:

-Void_star_to_hashtable is obviously wrong, but why exactly?

cdef c_HashTable void_star_to_hashtable(void* v):
    cdef void** b = [v]
    cdef c_HashTable* a = <c_HashTable*>b
    return a[0]

-line 86: sub_dict =
<c_HashTable*>void_star_to_hashtable(hash_table_lookup(self._base,
int_to_void_star(tindex)))

Why do I need a cast here?







-----Original Message-----
From: Robert Bradshaw [mailto:[email protected]] 
Sent: woensdag 9 september 2009 18:48
To: [email protected]
Subject: Re: [Cython] FW: cython and hash tables / dictionary

On Sep 9, 2009, at 7:18 AM, Sanne Korzec wrote:

> Ok, I have played around with this hash table and understand most  
> of the
> basics...
>
> I now would like to create the datastructure I need for my project.
>
> Basically what I need is two linked hash tables like this
>
> Key : int ---> value : hashtable { key: int ---> value: list(float,  
> float) }
>
> And
>
> Key : int ---> value : hashtable { key: int ---> value: float }

I'm not quite sure exactly what your notation means here. You need a  
hashtable from ints to floats, and another one from ints to pairs of  
floats?

> I am starting to wonder since this hash table works with voids for  
> key and
> value only if I should change the .c and .h files myself. I think I  
> would
> prefer not to.

The way this hashtable is intended to be used is that you malloc some  
room for your keys/values, and then pass the pointers into the table  
itself. Of course this is a bit of overhead, both in terms of runtime  
(all the malloc/free calls) and manual labor.

> Does anybody have a suggestion what would be wise? Preference goes  
> to quick
> implementation, not total optimization.

First, I'd see how fast Python hashtables work for you. That might be  
good enough. You could look into creating a special PairOfFoats cdef  
class to try to cut on the list/tuple overhead. If that doesn't work,  
the next thing would be to use the structure above (manually malloc- 
ing room for all the stuff, though if your ints fit into a void* you  
could use the same casting trick). That failing, you could write your  
own custom hash table.

As has been discovered, Python hashtables are pretty good, so expect  
at most a 10x (?) improvement writing your own.

- Robert

Attachment: cy_hash.pyx
Description: Binary data

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to