On 16 Aug 2011, at 23:51, Hongchun Jin wrote:

> Thanks Derek for  the quick reply. But I am sorry, I did not make it clear in 
> my last email.  Assume I have an array like 
> ['CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf'
> 
>  'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf'
> 
>  'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' ...,
> 
>  'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf'
> 
>  'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf'
> 
>  'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf']
> 
> I need to get the sub-string for date and time, for example,  
> 
> '2008-01-31T23-56-35ZD' in the middle of each element. In more general cases, 
> the sub-string could be any part of the string in such an array.  I hope to 
> assign the start and stop of the sub-string when I am subsetting it.  
> 
Well, maybe I was a bit too quick in my reply - see the documentation for 
np.char for some vectorized array operations that might be of use. 
Unfortunately, operations like 'lstrip' and 'rstrip' don't do exactly what you 
might them expect to, but you could use for example 
np.char.split(x,'.') 
to create an array of lists of substrings and then deal with them; something 
like removing the '.hdf' suffix would already require a somewhat lengthy 
recursion:

np.char.rstrip(np.char.rstrip(np.char.rstrip(np.char.rstrip(x, 'f'), 'd'), 
'h'), '.')

To also remove the leading substring in your case clearly would lead to a very 
clumsy expression...

It turns out however, something like the above for a similar test case with a 
length 100000 array takes about 3 times longer than the np.char.split() way; 
but even that is slower than a direct loop over string functions:

In [6]: %timeit -n 10 y = np.char.split(x, '.')
10 loops, best of 3: 188 ms per loop

In [7]: %timeit -n 10 y = np.char.split(x, '.'); z = np.fromiter( (l[1] for l 
in y), dtype='|S3', count=x.shape[0])
10 loops, best of 3: 218 ms per loop

In [8]: %timeit -n 10 z = np.fromiter( (l.split('.')[1] for l in x), 
dtype='|S3', count=x.shape[0])
10 loops, best of 3: 143 ms per loop

So it seems all of the vectorization in np.char is not that great after all 
(and the direct loop might still be acceptable for 1.e6 elements...)!

Cheers,
                                                                Derek

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to