Carol Alexandru created TIKA-4408:
-------------------------------------

             Summary: python file identified as application/x-sh under several 
circumstances
                 Key: TIKA-4408
                 URL: https://issues.apache.org/jira/browse/TIKA-4408
             Project: Tika
          Issue Type: Bug
          Components: core
            Reporter: Carol Alexandru


The [definition for text/x-python inĀ 
tika-mimetypes.xml|https://github.com/apache/tika/blob/25619272d2f615df4ad87e27e7c8dec576f37627/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L8347]
 is missing some matches. In particular
 * #!{+}/usr{+}/bin/env ...
 * all variants using python{+}3{+} instead of python

For this reason, a file starting with any of the following valid and fairly 
common lines are misidentified as application/x-sh (which matches #!/)

{{#!/usr/bin/env python3}}

{{#!/usr/bin/env python}}

{{{}#!{}}}{{{}/usr/bin/python3{}}}

{{... etc ...}}

I might do a pull request if I get around to it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to