[Tika Wiki] Update of "PooledTimeSeriesParser" by ChrisMattmann

Apache Wiki Thu, 26 Nov 2015 07:42:06 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.


The "PooledTimeSeriesParser" page has been changed by ChrisMattmann:
https://wiki.apache.org/tika/PooledTimeSeriesParser

New page:
[[http://michaelryoo.com/jpl-interaction.html]]
[[https://github.com/chrismattmann/pooled_time_series]]
[[http://arxiv.org/pdf/1412.6505v2.pdf]]

==== Metadata Representation ====

The ultimate goal of the project is to be able to extract metadata from videos 
and index it inside Solr.

Videos, like images, are just numbers - or a ordered sequence of number - or 
matrices.

There are many ways in which these numbers can be defined.
Some popular visual descriptors are Histogram of Gradients, Optical Flow 
vectors, RGB or Color Histograms.
The challenge is to figure out a way to map this datatype to a datatype that 
can be understood by Solr.

In the case of color based histograms, we can convert the image into a matrix 
of hex values, where each hex value is the pixel color value
and index that as a text_ws field in Solr.

This is what ShutterStock did with respect to an image search tool they've built
https://lucidworks.com/blog/shutterstock-searches-35-million-images-color-using-apache-solr/

Another idea I was thinking of was to index the data as a XHTML document of 
table values,

where each <tr>..</tr> would be a row of the feature matrix and <td> would be 
the corresponding element in that column.

However, while performing ranking or querying we would have to compute a 
distance function on these values (for the dataset and the query video)

How have other users solved this problem? There must be instances of matrix 
type data showing up in other domains, 
such as geography, physics and other scientific domains. How is the metadata 
designed in such cases?

[Tika Wiki] Update of "PooledTimeSeriesParser" by ChrisMattmann

Reply via email to