[
https://issues.apache.org/jira/browse/CONNECTORS-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172266#comment-14172266
]
Karl Wright edited comment on CONNECTORS-1074 at 10/15/14 11:44 AM:
--------------------------------------------------------------------
Hi Abe-san,
Replacing ExtensionMimeMap wherever it is used with a Tika.detect(String
filename) call, which is what you are proposing, will require that all Tika
jars and their dependencies be included in all the war files, rather than once
(in connector-lib). This is because they will be required to be accessed by
the root class loader. It may be the case that you could put only one Tika jar
and have the detect(String filename) method work, but you would need to
experiment to see.
In the Tika connector itself, the RepositoryDocument.setMimeType() method is
supposed to describe the binary stream that you get from
RepositoryDocument.getBinaryStream(). Since the output of Tika is always
characters, which the Tika transformer converts to utf-8 bytes, the content
type should always be "text/plain;charset=utf-8".
If you want to modify the Tika connector to report what the *original* mime
type was in some other metadata field, that is fine with me, but you should not
call setMimeType() because it will break things.
was (Author: [email protected]):
Hi Abe-san,
the RepositoryDocument.setMimeType() method is supposed to describe the binary
stream that you get from RepositoryDocument.getBinaryStream(). Since the
output of Tika is always characters, which the Tika transformer converts to
utf-8 bytes, the content type should always be "text/plain;charset=utf-8".
If you want to modify the Tika connector to report what the *original* mime
type was in some other metadata field, that is fine with me, but you should not
call setMimeType() because it will break things.
> Replace ExtensionMimeMap with new Tika().detect(filename)
> ---------------------------------------------------------
>
> Key: CONNECTORS-1074
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1074
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Framework core
> Reporter: Shinichiro Abe
> Fix For: ManifoldCF 2.0
>
>
> It would be nice if we could support many mime type since ManifoldCF has
> already been using Tika.
> {noformat}
> new Tika().detect(fileName);
> {noformat}
> returns String MimeType. Then we could set this into
> RepositoryDocument#setMimeType(mimeType) on each connector;
> Tika reference:
> [javadoc|http://tika.apache.org/1.6/api/org/apache/tika/Tika.html]
> [test
> code|http://svn.apache.org/viewvc/tika/tags/1.6-rc2/tika-core/src/test/java/org/apache/tika/TikaDetectionTest.java?view=markup]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)