Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "PooledTimeSeriesParser" page has been changed by ChrisMattmann: https://wiki.apache.org/tika/PooledTimeSeriesParser New page: [[http://michaelryoo.com/jpl-interaction.html]] [[https://github.com/chrismattmann/pooled_time_series]] [[http://arxiv.org/pdf/1412.6505v2.pdf]] ==== Metadata Representation ==== The ultimate goal of the project is to be able to extract metadata from videos and index it inside Solr. Videos, like images, are just numbers - or a ordered sequence of number - or matrices. There are many ways in which these numbers can be defined. Some popular visual descriptors are Histogram of Gradients, Optical Flow vectors, RGB or Color Histograms. The challenge is to figure out a way to map this datatype to a datatype that can be understood by Solr. In the case of color based histograms, we can convert the image into a matrix of hex values, where each hex value is the pixel color value and index that as a text_ws field in Solr. This is what ShutterStock did with respect to an image search tool they've built https://lucidworks.com/blog/shutterstock-searches-35-million-images-color-using-apache-solr/ Another idea I was thinking of was to index the data as a XHTML document of table values, where each <tr>..</tr> would be a row of the feature matrix and <td> would be the corresponding element in that column. However, while performing ranking or querying we would have to compute a distance function on these values (for the dataset and the query video) How have other users solved this problem? There must be instances of matrix type data showing up in other domains, such as geography, physics and other scientific domains. How is the metadata designed in such cases?
