On Feb 10, 2015, at 11:46 AM, Erik Hatcher <erikhatc...@mac.com> wrote:

> bin/post -c collection_name /path/to/file.doc

The almost trivial command to index a Word document in Solr, above, is most 
certainly appealing, but I’m wondering about the underlying index’s schema.

Tika makes every effort to extract as much metadata from Word documents as 
possible. This metadata includes dates, titles, authors, names of applications, 
last edit, etc. Some of this data can be very useful. The metadata can be 
packaged up as an XML file/stream and then sent to Solr for indexing. "Tastes 
great. Less filling.” But my question is, “To what degree does Solr know what 
to do with the metadata when the (kewl) command, above, is seemingly so 
generic? Does one need to create a Solr schema to specifically accommodate the 
Tika-created metadata, or do such things also come for ‘free’?”

— 
Eric Morgan

Reply via email to