[ 
https://issues.apache.org/jira/browse/SOLR-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Rafalovitch updated SOLR-9601:
----------------------------------------
    Attachment: tika2_20170316.tgz

Another version. I made TikaEntityParser an inner entity of 
FileListEntityProcessor, so the file name is now exposed as part of outer 
entity.

This allowed me to demonstrate rootEntity, another processor type as well as 
provide uniqueKey.

I also commented out the dynamicField *. If it gets uncommented, a couple extra 
fields will show from the FileListEntityProcessor, so there is a nice hidden 
reward for curiosity....

This should be ready to go with some formatting cleanup (4 spaces offset? 
whitespace before closing xml tags? anything else?).

Any final comments?

> DIH: Radicially simplify Tika example to only show relevant configuration
> -------------------------------------------------------------------------
>
>                 Key: SOLR-9601
>                 URL: https://issues.apache.org/jira/browse/SOLR-9601
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - DataImportHandler, contrib - Solr Cell (Tika 
> extraction)
>    Affects Versions: 6.x, master (7.0)
>            Reporter: Alexandre Rafalovitch
>            Assignee: Alexandre Rafalovitch
>              Labels: examples, usability
>         Attachments: tika2_20170308.tgz, tika2_20170316.tgz
>
>
> Solr DIH examples are legacy examples to show how DIH work. However, they 
> include full configurations that may obscure teaching points. This is no 
> longer needed as we have 3 full-blown examples in the configsets. 
> Specifically for Tika, the field types definitions were at some point 
> simplified to have less support files in the configuration directory. This, 
> however, means that we now have field definitions that have same names as 
> other examples, but different definitions. 
> Importantly, Tika does not use most (any?) of those modified definitions. 
> They are there just for completeness. Similarly, the solrconfig.xml includes 
> extract handler even though we are demonstrating a different path of using 
> Tika. Somebody grepping through config files may get confused about what 
> configuration aspects contributes to what experience.
> I am planning to significantly simplify configuration and schema of Tika 
> example to **only** show DIH Tika extraction path. It will end-up a very 
> short and focused example.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to