[
https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexandre Rafalovitch updated SOLR-9601:
----------------------------------------
Attachment: tika2_20170308.tgz
It is a little hard to generate a readable DIFF between the original Tika
example and one I created. So, for ease of testing, I just created it as a
separate *tika2* core that can be dropped next to the other DIH cores.
I removed all of the unused gunk, so the remaining files are tiny. I wish I
could remove the infoStream section, but the default is false and I am not sure
I should.
I've also added a prototype-oriented demo of wildcard, renamed and simplified
text field definition and did other minor cleanup in what is left.
I am not sure if I need to worry about docValues here.
Also, I have commented out uniqueKey section, but the corresponding *id* field
definition is missing. But it was missing in the original example too, so I am
not sure it is worth adding in the commented out section.
This is a big change (even if with tiny results files), so I would appreciate
people commenting on it before I actually commit it.
> DIH: Radicially simplify Tika example to only show relevant configuration
> -------------------------------------------------------------------------
>
> Key: SOLR-9601
> URL: https://issues.apache.org/jira/browse/SOLR-9601
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - DataImportHandler, contrib - Solr Cell (Tika
> extraction)
> Affects Versions: 6.x, master (7.0)
> Reporter: Alexandre Rafalovitch
> Assignee: Alexandre Rafalovitch
> Labels: examples, usability
> Attachments: tika2_20170308.tgz
>
>
> Solr DIH examples are legacy examples to show how DIH work. However, they
> include full configurations that may obscure teaching points. This is no
> longer needed as we have 3 full-blown examples in the configsets.
> Specifically for Tika, the field types definitions were at some point
> simplified to have less support files in the configuration directory. This,
> however, means that we now have field definitions that have same names as
> other examples, but different definitions.
> Importantly, Tika does not use most (any?) of those modified definitions.
> They are there just for completeness. Similarly, the solrconfig.xml includes
> extract handler even though we are demonstrating a different path of using
> Tika. Somebody grepping through config files may get confused about what
> configuration aspects contributes to what experience.
> I am planning to significantly simplify configuration and schema of Tika
> example to **only** show DIH Tika extraction path. It will end-up a very
> short and focused example.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]