[
https://issues.apache.org/jira/browse/JCR-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469546
]
Jukka Zitting commented on JCR-728:
-----------------------------------
I've looked at jmimemagic too, but as you mentioned, it's a bit limited. It's
also licensed under the LGPL, which makes it a bit troublesome for us.
There's a recent codebase at
http://hedges.net/archives/2006/11/08/java-shared-mime-info/ that seems pretty
good, but the code is under the GPL.
I recently discussed with some people form Apache Nutch about a project to
implement the shared mime info standard from freedesktop.org
(http://www.freedesktop.org/wiki/Standards_2fshared_2dmime_2dinfo_2dspec), and
apparently someone already had some Apache-licensed code for that but I haven't
yet seen it.
I've been planning to propose an implementation project for the mime info
standard in Apache Labs (http://labs.apache.org/), but if there's more interest
within the Jackrabbit community we could also start working on it within the
jackrabbit-text-extractors component.
> Automatic MIME type detection
> -----------------------------
>
> Key: JCR-728
> URL: https://issues.apache.org/jira/browse/JCR-728
> Project: Jackrabbit
> Issue Type: Improvement
> Components: indexing
> Reporter: Jukka Zitting
> Priority: Minor
>
> Currently only the jcr:mimeType property is used to determine the MIME type
> and thus the applicable text extractor to use for indexing a document. If the
> jcr:mimeType property is not available or is set to a generic value like
> "application/octet-stream", then the indexer could also use some heuristics
> based on the node name or magic numbers within the binary stream to determine
> the type of the document.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.