Yaniv Kunda created TIKA-1722: --------------------------------- Summary: Tika methods that accept a File needlessly convert it to a URL Key: TIKA-1722 URL: https://issues.apache.org/jira/browse/TIKA-1722 Project: Tika Issue Type: Improvement Components: core Reporter: Yaniv Kunda Priority: Minor Fix For: 1.11
The following methods: - Tika.detect(File) - Tika.parse(File) - Tika.parseToString(File) Convert the given File to a URL and use the corresponding overloaded method that accepts a URL. This seems like a shortcut, but essentially does the following: # Converts the file to a URI # Converts the URI to a URL # Calls TikaInputStream.get(URL, Metadata), which then performs the following special handling: # Checks if the protocol is "file" # Tries to convert the URL (back) to a URI # Creates a File around the URI # Checks if file.isFile() # Calls TikaInputStream.get(File, Metadata) The special handling in TikaInputStream.get(URL/URI) is a good optimization for in-the-wild file resources, but for internal uses it can be skipped - making Tika call TikaInputStream.get(File, Metadata) directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)