MrC escreveu:
> Leonardo Rodrigues Magalhães wrote:
>   
>> MrC escreveu:
>>     
>>> Leonardo Rodrigues Magalhães wrote:
>>>   
>>>       
>>>>     Hello,
>>>>
>>>>     I have banned some file/MIME types in amavisd-2.6.0 using the 'old' 
>>>> way of doing this, the $banned_filename_re.
>>>>
>>>>     The banned file admin and user notifications, which i enabled, 
>>>> brings me something like:
>>>>
>>>> Banned name: multipart/mixed | 
>>>> application/vnd.ms-powerpoint,.doc,Chaplin.pps
>>>> Content type: Banned
>>>> Internal reference code for the message is 24067-22/NWynDBsexbaI
>>>>
>>>>
>>>>     It's clearly a powerpoint file, because of it's extension (.pps) as 
>>>> well as it's mime type 'application/vnd.ms-powerpoint'.
>>>>
>>>>     What i'm trying to understand is where that '.doc' cames from !!!
>>>>
>>>>     
>>>>         
>>> What type of document does the file(1) utility indicate?
>>> What version of file is on your system?
>>>   
>>>       
>>     I'm running on a Fedora 8 system, with:
>>
>> [EMAIL PROTECTED] ~]# rpm -qi file
>> Name        : file                         Relocations: (not relocatable)
>> Version     : 4.21                              Vendor: Fedora Project
>> Release     : 5.fc8                         Build Date: Tue 29 Jan 2008 
>> 06:58:26 AM BRST
>>
>>     which is the latest file package from F8 repositories.
>>
>>     on this system, file returns for .doc and .ppt documents:
>>
>> [EMAIL PROTECTED] user]# file Defesa.doc
>> Defesa.doc: Microsoft Office Document
>> [EMAIL PROTECTED] user]# file Projeto\ Final\ II\ VPN\ -\ última.ppt
>> Projeto Final II VPN - última.ppt: Microsoft Office Document
>> [EMAIL PROTECTED] user]#
>>
>>     hmmmmm ...... seems file returns only 'Microsoft Office Document' .....
>>
>>     
>
> In the past several versions, the file(1) utility has changed its
> opinion many times regarding how to treat PPT documents:
>
> $ file -v
> file-4.21
> magic file from /usr/share/file/magic
>
> $ file test.ppt
> test.ppt: Microsoft Installer
>
> $ file -i test.ppt
> test.ppt: \012- application/msword
> ---
> $ file -v
> file-4.24
> magic file from /usr/local/share/file/magic
>
> $ file ~/test.ppt
> /home/cappella/test.ppt: Microsoft Office Document
>
> $ file -i ~/test.ppt
> /home/cappella/test.ppt: application/octet-stream
>
> Amavis uses a mapping of full type names to short names, which are then
> later referenced in the $banned_filename_re maps.  You can see in the
> latest amavisd that Microsoft Office Document is mapped to short name
> type "doc".  This is where ".doc" comes from.
>
>   $map_full_type_to_short_type_re = [
>      ...
>     [qr/^Rich Text Format data\b/       => 'rtf'],
>     [qr/^Microsoft Office Document\b/i  => 'doc'],  # OLE2: doc, ppt,
>                                                     # xls, ...
>
> It is likely that the file formats for PPT, XLS, DOC, .etc have not been
> reverse engineered to uniquely distinguish them from each other, and
> instead they are all mapped to Microsoft Office Document (in more recent
> file versions).  The file formats became much more complex with later
> versions of Office.
>
> Previous versions of file(1), like 4.21, were very broken in terms of
> identification, and there were many false identifications:
>
> $ file -v
> file-4.21
> magic file from /usr/share/file/magic
>
> $ file -i test.xls
> test.xls: \012- application/msword
>
> $ file test.xls
> test.xls: Microsoft Installer
>
> Clearly, this Excel spreadsheet is not an Word document.  Fedora adds
> its own patches to the file utility, of which yours is based on 4.21.
> Since your PPT is identified as Microsoft Office Document, it is clear
> that Fedora has updated the magic database used for file identification
> bringing in line with the more recent 4.24/4.25 releases of file.  Yet
> the problem still remains - file types from more recent Office pacakges
> are identified generically as Microsoft Office Document, and not Excel,
> PowerPoint, etc.  Given that, I'm not sure what more you can do to
> distinguish the types.
>
> Here's a thread which discusses some of the file(1) issues a while ago:
>
> http://groups.google.com/group/mailing.unix.amavis-user/browse_thread/thread/7147b0a90573690c/04ca5171867925c1?lnk=gst&q=powerpoint+file#04ca5171867925c1
>   

    MrC, i think it's impossible to put the points clearer than what you 
did :)

    Unfortunelly, based on the facts you exposed, seems i cannot expect 
exact identification of PowerPoint/Word/Excel files ... but, no problem 
on that.

    i have changed the 'doc' mapping to 'document' on 
$map_full_type_to_short_type_re . I'm even thinking of changing it again 
to 'msoffice'. With these small changes, i can get ride of the '.doc' 
that is appearing on the blocked message.

    Thank you very much for your points and clarifications.

-- 


        Atenciosamente / Sincerily,
        Leonardo Rodrigues
        Solutti Tecnologia
        http://www.solutti.com.br

        Minha armadilha de SPAM, NÃO mandem email
        [EMAIL PROTECTED]
        My SPAMTRAP, do not email it





-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
AMaViS-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/

Reply via email to