On a related note, we do a rubbish job of guessing the content type from the content of files themselves via URLConnection#guessContentTypeFromStream(InputStream). I've added a bit more logic in there for the most obvious cases, but when you consider the info in your typical Linux 'magic' file we have a long way to go. My first thought was whether we could ask the platform to guess for us, but I don't think there is any equivalent on Windows etc?
Regards, Tim Alexey Petrenko wrote: > Looks like both application/rtf and text/rtf are correct from IANA [1] > point of view. > So I do not see any harm to follow RI's behavior in this case. > > By the way application/rtf specification looks more fresh then text/rtf > > SY, Alexey > > 1. http://www.iana.org/assignments/media-types/ > > 2007/8/31, Tim Ellison <[EMAIL PROTECTED]>: >> The MIME types for a given extension are defined here [1] which we took >> from httpd's view of the world. So while it would be trivial to change >> them to be the same as the RI, I'm inclined to: >> - leave rtf as text/rtf >> - add java to our list as text/plain >> - leave doc as application/msword >> then figure out how to snoop the stream for other types. >> >> [1] >> http://svn.apache.org/viewvc/harmony/enhanced/classlib/trunk/depends/files/content-types.properties?revision=494047&view=markup >> >> Thoughts? >> Tim >> >> >> Vasily Zakharov (JIRA) wrote: >>> [classlib][luni] URLConnection.getContentType() works with files incorrectly >>> ---------------------------------------------------------------------------- >>> >>> Key: HARMONY-4699 >>> URL: https://issues.apache.org/jira/browse/HARMONY-4699 >>> Project: Harmony >>> Issue Type: Bug >>> Components: Classlib >>> Reporter: Vasily Zakharov >>> >>> >>> In Harmony implementation, java.net.URLConnection.getContentType() works >>> incorrectly when addresses a file URL: >>> >>> 1. For files with .rtf extension, RI returns "application/rtf", while >>> Harmony returns "text/rtf". >>> >>> 2. For files with .java extension, RI returns "text/plain", while Harmony >>> returns "content/unknown". >>> >>> 3. For files with .doc extension, RI returns "content/unknown", while >>> Harmony returns "application/msword". The same is true for other known >>> extensions. >>> >>> 4. For files with unrecognized extension and with HTML content, RI returns >>> "text/html", while Harmony returns "content/unknown". >>> >>> Items 1 and 2 look like a minor issues that would better be fixed for >>> compatibility with RI. >>> >>> Item 3 looks like a non-bug difference, as Harmony behaves clearly better >>> than RI in these cases. >>> >>> Item 4 looks like a serious bug, as RI clearly looks into file content for >>> the file type, and Harmony does not. Looks like >>> org.apache.harmony.luni.internal.net.www.protocol.file.FileURLConnection.getContentType() >>> needs to be fixed to use guessContentTypeFromStream() in addition to >>> guessContentTypeFromName(). >>> >>> The attached archive contains the reproducer with some test files it uses. >>> Here's the reproducer code: >>> >>> public class Test { >>> static void printContentType(String fileName) throws >>> java.io.IOException { >>> System.out.println(fileName + ": " + new java.net.URL("file:" + >>> fileName).openConnection().getContentType()); >>> } >>> public static void main(String argv[]) { >>> try { >>> printContentType("test.rtf"); >>> printContentType("Test.java"); >>> printContentType("test.doc"); >>> printContentType("test.htx"); >>> } catch (Exception e) { >>> e.printStackTrace(System.out); >>> } >>> } >>> } >>> >>> Output on RI: >>> >>> test.rtf: application/rtf >>> Test.java: text/plain >>> test.doc: content/unknown >>> test.htx: text/html >>> >>> Output on Harmony: >>> >>> test.rtf: text/rtf >>> Test.java: content/unknown >>> test.doc: application/msword >>> test.htx: content/unknown >>> >>> This issue is a blocker for HARMONY-4696, as on RI >>> JEditorPane.getContentType() should be based on >>> URLConnection.getContentType() that now works incorrectly. >>> >>> >
