Carol Alexandru created TIKA-4408: ------------------------------------- Summary: python file identified as application/x-sh under several circumstances Key: TIKA-4408 URL: https://issues.apache.org/jira/browse/TIKA-4408 Project: Tika Issue Type: Bug Components: core Reporter: Carol Alexandru
The [definition for text/x-python inĀ tika-mimetypes.xml|https://github.com/apache/tika/blob/25619272d2f615df4ad87e27e7c8dec576f37627/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L8347] is missing some matches. In particular * #!{+}/usr{+}/bin/env ... * all variants using python{+}3{+} instead of python For this reason, a file starting with any of the following valid and fairly common lines are misidentified as application/x-sh (which matches #!/) {{#!/usr/bin/env python3}} {{#!/usr/bin/env python}} {{{}#!{}}}{{{}/usr/bin/python3{}}} {{... etc ...}} I might do a pull request if I get around to it. -- This message was sent by Atlassian Jira (v8.20.10#820010)