Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "AdityaDhulipala" page has been changed by AdityaDhulipala:
https://wiki.apache.org/tika/AdityaDhulipala?action=diff&rev1=1&rev2=2

  [[https://github.com/chrismattmann/pooled_time_series]]
  [[http://arxiv.org/pdf/1412.6505v2.pdf]]
  
+ ==== Metadata Representation ====
+ 
+ The ultimate goal of the project is to be able to extract metadata from 
videos and index it inside Solr.
+ 
+ Videos, like images, are just numbers - or a ordered sequence of number - or 
matrices.
+ 
+ There are many ways in which these numbers can be defined.
+ Some popular visual descriptors are Histogram of Gradients, Optical Flow 
vectors, RGB or Color Histograms.
+ The challenge is to figure out a way to map this datatype to a datatype that 
can be understood by Solr.
+ 
+ In the case of color based histograms, we can convert the image into a matrix 
of hex values, where each hex value is the pixel color value
+ and index that as a text_ws field in Solr.
+ 
+ This is what ShutterStock did with respect to an image search tool they've 
built
+ 
https://lucidworks.com/blog/shutterstock-searches-35-million-images-color-using-apache-solr/
+ 
+ Another idea I was thinking of was to index the data as a XHTML document of 
table values,
+ 
+ where each <tr>..</tr> would be a row of the feature matrix and <td> would be 
the corresponding element in that column.
+ 
+ However, while performing ranking or querying we would have to compute a 
distance function on these values (for the dataset and the query video)
+ 
+ How have other users solved this problem? There must be instances of matrix 
type data showing up in other domains, 
+ such as geography, physics and other scientific domains. How is the metadata 
designed in such cases?
+ 
+ 
  
  ----
  CategoryHomepage

Reply via email to