[ 
https://issues.apache.org/jira/browse/TIKA-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568898#comment-16568898
 ] 

Hudson commented on TIKA-2648:
------------------------------

SUCCESS: Integrated in Jenkins build tika-branch-1x #72 (See 
[https://builds.apache.org/job/tika-branch-1x/72/])
TIKA-2648 detect interpreted server-side script languages (tallison: 
[https://github.com/apache/tika/commit/f5a2faefd17936e1ad2c9b6b8c9b0ea3d3c30d99])
* (edit) tika-core/src/test/java/org/apache/tika/mime/MimeDetectionTest.java
* (edit) tika-core/src/test/resources/org/apache/tika/mime/custom-mimetypes2.xml
* (edit) tika-core/src/main/java/org/apache/tika/mime/MimeTypesReader.java
* (edit) tika-core/src/main/java/org/apache/tika/mime/MimeType.java
* (edit) 
tika-core/src/main/java/org/apache/tika/mime/MimeTypesReaderMetKeys.java
* (edit) tika-core/src/test/java/org/apache/tika/mime/CustomReaderTest.java
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* (edit) tika-core/src/main/java/org/apache/tika/mime/MimeTypes.java


> mime detection based on resource name detects resources as "text/x-php" 
> instead of "text/html" 
> -----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2648
>                 URL: https://issues.apache.org/jira/browse/TIKA-2648
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Gerard Bouchar
>            Priority: Major
>
> When using tika to detect a mime type given only an URL containing ".php" and 
> a content-type hint of "text/html", it guesses "text/x-php", whereas one 
> could expect "text/html".
> {code}
> TikaConfig tika = new TikaConfig();
> Metadata metadata = new Metadata();
> String url = "https://www.facebook.com/home.php";;
> metadata.set(Metadata.RESOURCE_NAME_KEY, url);
> metadata.set(Metadata.CONTENT_TYPE, "text/html");
> MediaType type = tika.getDetector().detect(null, metadata);
> System.out.println(url + " is of type " + type.toString());
> // Prints https://www.facebook.com/home.php is of type text/x-php
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to