Hey Nick,

We've done something like this before a lot in the OODT project [1], in fact, 
in many cases wrapping Tika to do it.

Check out our CmdLineMetExtractor class [2], and this guide [3] on some of our 
baked in MetExtractors. I think it would be awesome if we could support a 
similar interface in Tika (I'd love to push those details upstream of OODT).

Cheers,
Chris

[1] http://oodt.apache.org
[2] 
http://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/java/org/apache/oodt/cas/metadata/extractors/CmdLineMetExtractor.java
[3] http://oodt.apache.org/components/maven/metadata/user/basic.html

On Apr 5, 2011, at 1:31 PM, Nick Burch wrote:

> Hi All
> 
> I'm currently pondering trying to add support for using ffmpeg to provide 
> metadata on video (and audio) files. This would be useful for me for the 
> file formats which we don't currently support, which is generally the ones 
> where there's no handy Java library to call for them.
> 
> At the moment, it looks like we do have some command line support, in the 
> form of ExternalParser, but that's focused only on the text extraction 
> part. It also looks like it might want a few tweaks to make it easier to 
> use.
> 
> I was therefore thinking of doing some work to improve it, and then adding 
> in metadata too (likely via regexps or similar). One thought was to make 
> it possible to use ExternalParser in two ways. The first way would be to 
> subclass it and provide the mime type, command, and metadata regexps. The 
> other would be to provide an xml config file, which'd supply the details. 
> Likely with both of these we'd want the parser to check for the external 
> command, and claim not to be available if the command isn't there.
> 
> Anyone got any thoughts on this sort of thing? Anyone done something like 
> it before?
> 
> Nick


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to