Bjorn asked:
> I am thinking about if it is possible to get a util that 
> can read parts or whole files and decide what they are.

Raul wrote:
> If you are on linux, the 'file' utility will do it.

Yes, and if you are not on Linux (or BSD, OS X, etc), you can still get your 
hands on the “magic files”, which are plain-text files describing the pattern 
which is used to determine a file’s type*. Just Google for them.

For example, on my Mac, there is a file /usr/share/file/magic/dyadic which 
identifies Dyalog APL workspaces and component files (I was fairly surprised to 
find this installed by default on OSX!).

#------------------------------------------------------------------------------
# $File: dyadic,v 1.4 2009/09/19 16:28:09 christos Exp $
# Dyadic: file(1) magic for Dyalog APL.
#
0       byte    0xaa
>1      byte    <4              Dyalog APL
>>1     byte    0x00            incomplete workspace
>>1     byte    0x01            component file
>>1     byte    0x02            external variable
>>1     byte    0x03            workspace
>>2     byte    x               version %d
>>3     byte    x               .%d

This says: if the first byte of a file is 170 (i.e. 0xAA), and the 2nd byte of 
the file is less than 4, then you’ve got a Dyalog APL object. If that pattern 
doesn’t match, “file” will know it’s got something other than a Dyalog APL 
object, so it will move on and try out the next magic file pattern.

If that pattern does match, however, the following lines help identify the kind 
of Dyalog APL object more specifically.

If the 2nd byte (which must be less than 4) is zero, then it’s an “incomplete 
workspace”; if one, then a “component file”, if two, then an “external 
variable”; if three, then a (not-incomplete) “workspace”.

Again, if the initial test about (firstByte=170) *. (secondByte<4) matched, and 
we know we’re dealing with a Dyalog APL object, then the 3rd and 4th bytes will 
give the major and minor versions of the interpreter which created it, 
respectively.

Bjorn wrote:
> I know extensions are indications of what they are.

Worth pointing out, pragmatically speaking, if a file’s type is not 
self-evident on your OS, or file extensions being insufficient or misleading 
clues often enough that you need to use “file” with some frequency, it might be 
more productive to identify the root cause of that issue, rather than 
re-implementing the utility.

I suppose one use case for “file” is increasing one’s confidence that a file 
one downloaded from a not-perfectly-trustworthy source is indeed what it 
advertises itself to be…


-Dan

* Please note these “magic file tests” are applied at a specific point in the 
utility’s workflow, after some preliminary tests at a higher level. 

So the files are useful, but not completely sufficient. If you can’t use “file” 
directly, and want to reimplement it, you’ll have to reimplement some of these 
preliminary tests as well.

A good place to start is the manpage for file, followed by its source code (if 
you really want to get into it).

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to