On Wed, Sep 22, 2010 at 11:25 PM, Neal Becker <[email protected]> wrote: > David Cournapeau wrote: > >> On Wed, Sep 22, 2010 at 11:10 PM, Neal Becker <[email protected]> wrote: >>> A colleague of mine posed the following problem. He wants to search >>> large files of binary data for sequences. >>> >> >> Is there a reason why you cannot use one of the classic string search >> algorithms applied to the bytestream ? >> > > What would you suggest? Keep in mind the file is to big to fit into memory > all at once.
Do you care about speed ? String search and even regular expression are supposed to work on mmap data, but I have never used them on large datasets, so I don't know how they would perform. Otherwise, depending on the data and whether you can afford pre-computing, algorithms like Knuth Morris Pratt can speed things up. But I would assume you would have to do it in C to hope any speed gain compared to python string search . cheers, David _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
