Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "PooledTimeSeriesParser" page has been changed by ChrisMattmann: https://wiki.apache.org/tika/PooledTimeSeriesParser?action=diff&rev1=1&rev2=2 + The [[http://arxiv.org/abs/1412.6505|Pooled Time Series algorithm]] was developed by [[http://michaelryoo.com/jpl-interaction.html|Michael Ryoo]] and it allows for video descriptors to be considered over time and in this consideration for videos to be compared based on the activity going on in their scenes. In short, Pooled Time Series is a video comparison metric. An ALv2 licensed version of the [[https://github.com/chrismattmann/pooled_time_series|Pooled Time Series code]] is available for use in computing Histogram of Oriented Gradients (HOG) and Histogram of Optical Flows (HOF) which can be useful extracted data and metadata for a Tika Parser. - [[http://michaelryoo.com/jpl-interaction.html]] - [[https://github.com/chrismattmann/pooled_time_series]] - [[http://arxiv.org/pdf/1412.6505v2.pdf]] - ==== Metadata Representation ==== + = Metadata Representation = + The ultimate goal of the project is to be able to extract metadata and data from videos and to index that information inside of a searh engine like Apache Solr. Videos, like images, are just numbers - or a ordered sequence of number - or matrices. There are many ways in which these numbers can be defined. Some popular visual descriptors are Histogram of Gradients, Optical Flow vectors, RGB or Color Histograms. The challenge is to figure out a way to map this datatype to a datatype that can be understood by Solr. In the case of color based histograms, we can convert the image into a matrix of hex values, where each hex value is the pixel color value - The ultimate goal of the project is to be able to extract metadata from videos and index it inside Solr. - - Videos, like images, are just numbers - or a ordered sequence of number - or matrices. - - There are many ways in which these numbers can be defined. - Some popular visual descriptors are Histogram of Gradients, Optical Flow vectors, RGB or Color Histograms. - The challenge is to figure out a way to map this datatype to a datatype that can be understood by Solr. - - In the case of color based histograms, we can convert the image into a matrix of hex values, where each hex value is the pixel color value and index that as a text_ws field in Solr. + = Some Related Efforts = - This is what ShutterStock did with respect to an image search tool they've built - https://lucidworks.com/blog/shutterstock-searches-35-million-images-color-using-apache-solr/ - Another idea I was thinking of was to index the data as a XHTML document of table values, + ShutterStock developed an [[https://lucidworks.com/blog/shutterstock-searches-35-million-images-color-using-apache-solr/|image search tool]] using a similar approach. - where each <tr>..</tr> would be a row of the feature matrix and <td> would be the corresponding element in that column. + = Representation of output data = - However, while performing ranking or querying we would have to compute a distance function on these values (for the dataset and the query video) + The data output from the Pooled Time Series parser is an XHTML document of table values, where each <tr>..</tr> would be a row of the feature matrix and <td> would be the corresponding element in that column. When using a search engine like Apache Solr to do ranking or querying we can to compute a distance function on these values (for the dataset and the query video), such as Chi-Squared, which is what the pooled time series algorithm does. + A Tika Parser has been developed that implements the Pooled Time Series algorithm above and that outputs the HOF and HOG data from videos for use in later processing and indexing. Read on below to install and use it! - How have other users solved this problem? There must be instances of matrix type data showing up in other domains, - such as geography, physics and other scientific domains. How is the metadata designed in such cases? + = Pre-requisites = + + == Install Pooled Time Series == +
