Robert Bradshaw wrote:
> On May 14, 2009, at 1:37 AM, Cristi Constantin wrote:
>
>> Good day.
>> I am trying to obtain MAXIMUM speed for this mappings:
>>
>> MyString = '\n'.join([ ''.join([j.encode('utf8') for j in i]) for i
>> in NdArrayList ])
My first try would be to move the .encode() out of the inner loop, to
build one big list instead (including the newlines) and join it at the end
to run the encoding on the complete string result.
If you can afford a dependency on Py2.5+, drop the list comprehensions in
favour of generator expressions (i.e. remove the [] brackets).
No Cython so far, just a faster way to do it in Python space. Both the
encoder run and the method call in the inner most loop above kills
performance here.
>> "NdArrayList" is a list of numpy ndarrays that contain one Unicode
>> character and i want to transform it into a string, AS FAST AS
>> POSSIBLE.
>> For example : NdArrayList can be = [ np.array([u'a', u'\u2588'],
>> dtype='<U1'), np.array([u'\u00a9', u'\u00ce', u'\u2022'],
>> dtype='<U1') ]
>>
>> I also want to make MyList = [ ''.join([j.encode('utf8') for j in
>> i]) for i in NdArrayList ]. In this case i need a list of united-as-
>> string numpy ndarrays.
>>
>> When i call the codes from Python i get better performance than
>> compiling into a Cython function like:
>>
>> cdef str __Ndarray2String( list TempA ):
>> #
>> return '\n'.join([ ''.join([j.encode('utf8') for j in i]) for i
>> in TempA ])
>> #
>> def Ndarray2String( v ):
>> return __Ndarray2String( v )
>>
>> Can anyone please sugest a good method?
>
> To go for maximum speed, I would suggest pulling all the data out of
> the array first (perhaps using the buffer interface) and using
> assignment and/or memcpy to put it into a big string (manually
> inserting the newlines). Then call encode to get the whole thing in
> utf8.
Also, given that the retrieval of each of the substrings has a little
overhead, I'd recommend copying the buffer content directly into a memory
buffer, over-reallocating at need to avoid multiple copying. There are
Python C-API methods that can then encode the resulting PyUNICODE buffers
(assuming that a numpy "<U1" buffer is a PyUNICODE buffer) to a UTF-8
string.
Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev