On Wed, Sep 2, 2009 at 12:46 PM, Robert Kern <robert.k...@gmail.com> wrote:
> On Wed, Sep 2, 2009 at 12:33, Gökhan Sever<gokhanse...@gmail.com> wrote: > > How your find suggestion work? It just returns the location of the first > > occurrence. > > http://docs.python.org/library/stdtypes.html#str.find > > str.find(sub[, start[, end]]) > Return the lowest index in the string where substring sub is > found, such that sub is contained in the range [start, end]. Optional > arguments start and end are interpreted as in slice notation. Return > -1 if sub is not found. > > But perhaps you should profile your code to see where it is actually > taking up the time. Regexes on 1.3 MB of data should be quite fast. > > In [21]: marker = '\x00\x...@\x00$\x00\x02' > > In [22]: block = marker + '\xde\xca\xfb\xad' * ((1024-8) // 4) > > In [23]: data = int(round(1.3 * 1024)) * block > > In [24]: import re > > In [25]: r = re.compile(re.escape(marker)) > > In [26]: %time r.findall(data) > CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s > Wall time: 0.01 s > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > This is what I have been using. It's not returning exactly what I want but very close besides its being slow: I[52]: mypattern = re.compile('\0\0\1\0.+?\...@\0\$', re.DOTALL) I[53]: res = mypattern.findall(ss) I[54]: len res -----> len(res) O[54]: 95 I[55]: %time mypattern.findall(ss); CPU times: user 9.14 s, sys: 0.00 s, total: 9.14 s Wall time: 9.16 s I[57]: res[0] O[57]: '\x00\x00\x01\x00\x00\x00\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00 *prj.300*\x00; Version = 1\nProjectName = PME1 2009 King Air N825ST\nFlightId = \nAircraftType = WMI King Air 200\nAircraftId = N825ST\nOperatorName = Weather Modification Inc.\nComments = \n\x00\x00@ \x00$' I need the part starting with the bold typed section (prj.300) and till the end of the section. I need the bold part because I can construct file names from that and write the following content in it. Ohh when it works the resulting search should return me 86 occurrence. -- Gökhan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion