On Wed, Sep 22, 2010 at 11:25 PM, Neal Becker <[email protected]> wrote:
> David Cournapeau wrote:
>
>> On Wed, Sep 22, 2010 at 11:10 PM, Neal Becker <[email protected]> wrote:
>>> A colleague of mine posed the following problem.  He wants to search
>>> large files of binary data for sequences.
>>>
>>
>> Is there a reason why you cannot use one of the classic string search
>> algorithms applied to the bytestream ?
>>
>
> What would you suggest?  Keep in mind the file is to big to fit into memory
> all at once.

Do you care about speed ? String search and even regular expression
are supposed to work on mmap data, but I have never used them on large
datasets, so I don't know how they would perform.

Otherwise, depending on the data and whether you can afford
pre-computing, algorithms like Knuth Morris Pratt can speed things up.
But I would assume you would have to do it in C to hope any speed gain
compared to python string search .

cheers,

David
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to