On Monday 25 September 2006 22:55, Robert Persson <[EMAIL PROTECTED]> 
wrote about '[gentoo-user] I have 146,000 files in lost+found. How do I 
sort them?':
> Am I likely to find many usable files in that /lost+found directory?

Maybe.  I tried to recover a corrupted ext3 boot recently and was unable to 
pull anything useful out of lost and found that was larger than a 
symlink. :(  If a number of files NOT in lost+found were corrupt, it's 
likely most of the files in lost+found are corrupt as well.

That said, /boot data is generally easy to replace, so I put no effort into 
recovering files that were corrupted.  If the data was valuable, if might 
be worth it to spend some time sorting those out.

> If I can, how can I best sift through them?

Carefully. :)

> Is there a utility, or 
> something I could drop into a simple bash script, that would look at the
> first few bytes of the file and, say, identify it as a jpeg or an xml
> file, so that it could be given an appropriate file extension, deleted
> or moved?

As the other poster mentioned, the file utility is useful for identifying 
the type of file.  Keep in mind though that is only looks at the first few 
bytes of the file, if there's corruption later on file won't notice.

> Or is there one that could distinguish a text file from a 
> binary?

Of course, file does this to some extent.  A MIME type of text/* is 
generally text, while anything else is binary.  But, file's output (by 
default) isn't a simple "binary" or "text" string.

Some of the GNU utilities that are meant for text files will complain 
before operating on a binary file, so you could use those for this task, 
possibly.  (I'm thinking of less and grep.)  In particular, 
grep '[^[:print:]]' should return true when run against a file that 
contains non-printable characters (like control characters or NUL, and, 
depending on locale, non-7-bit-clean characters).

> Are there any other strategies I could use to sift through these files
> (assuming it would be worth doing)?

Well, before you write some sort of bash script around file to rename 
stuff, you'll probably want to remove anything that is clearly trash, like 
device nodes or 0-length files.  Something like:
find lost+found \! \( -type f -o -type d \ -o -type l \) -o -empty -delete
should work if you are using GNU find.

-- 
"If there's one thing we've established over the years,
it's that the vast majority of our users don't have the slightest
clue what's best for them in terms of package stability."
-- Gentoo Developer Ciaran McCreesh

Attachment: pgpMg3bjhF81I.pgp
Description: PGP signature

Reply via email to