Yaniv Kunda created TIKA-1722:
---------------------------------

             Summary: Tika methods that accept a File needlessly convert it to 
a URL
                 Key: TIKA-1722
                 URL: https://issues.apache.org/jira/browse/TIKA-1722
             Project: Tika
          Issue Type: Improvement
          Components: core
            Reporter: Yaniv Kunda
            Priority: Minor
             Fix For: 1.11


The following methods:
- Tika.detect(File)
- Tika.parse(File)
- Tika.parseToString(File)

Convert the given File to a URL and use the corresponding overloaded method 
that accepts a URL.
This seems like a shortcut, but essentially does the following:
# Converts the file to a URI
# Converts the URI to a URL
# Calls TikaInputStream.get(URL, Metadata), which then performs the following 
special handling:
# Checks if the protocol is "file"
# Tries to convert the URL (back) to a URI
# Creates a File around the URI
# Checks if file.isFile() 
# Calls TikaInputStream.get(File, Metadata)

The special handling in TikaInputStream.get(URL/URI) is a good optimization for 
in-the-wild file resources, but for internal uses it can be skipped - making 
Tika call TikaInputStream.get(File, Metadata) directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to