Re: Prevent opening binary/other garbage files

bauss via Digitalmars-d-learn Sat, 29 Sep 2018 23:21:01 -0700

On Saturday, 29 September 2018 at 15:52:30 UTC, helxi wrote:

I'm writing a utility that checks for specific keyword(s) foundin the files in a given directory recursively. What's the beststrategy to avoid opening a bin file or some sort of garbagedump? Check encoding of the given file?
If so, what are the most popular encodings (in POSIX if thatmatters) and how do I detect them?

What I would do is read the frist 512 bytes and the last 512bytes and if over 50% of those bytes are below 32 and not 8, 9,10, 11, 12 or 13 then chances are you have a binary file, butthere is nothing that stops someone from writing "invalid" bytesinto a text file. There are no limitations on what a file canhold and generally the system treats all files the same.

The reason I recommend to read the first 512 and last 512 bytesis because some binary files may contain legit text strings etc.so by picking two places chances are you won't have two segmentswith text.

Re: Prevent opening binary/other garbage files

Reply via email to