Alexandre Rafalovitch created SOLR-9601:
-------------------------------------------

             Summary: DIH: Radicially simplify Tika example to only show 
relevant configuration
                 Key: SOLR-9601
                 URL: https://issues.apache.org/jira/browse/SOLR-9601
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: contrib - DataImportHandler, contrib - Solr Cell (Tika 
extraction)
    Affects Versions: 6.x, master (7.0)
            Reporter: Alexandre Rafalovitch
            Assignee: Alexandre Rafalovitch


Solr DIH examples are legacy examples to show how DIH work. However, they 
include full configurations that may obscure teaching points. This is no longer 
needed as we have 3 full-blown examples in the configsets. 

Specifically for Tika, the field types definitions were at some point 
simplified to have less support files in the configuration directory. This, 
however, means that we now have field definitions that have same names as other 
examples, but different definitions. 

Importantly, Tika does not use most (any?) of those modified definitions. They 
are there just for completeness. Similarly, the solrconfig.xml includes extract 
handler even though we are demonstrating a different path of using Tika. 
Somebody grepping through config files may get confused about what 
configuration aspects contributes to what experience.

I am planning to significantly simplify configuration and schema of Tika 
example to **only** show DIH Tika extraction path. It will end-up a very short 
and focused example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to