[ 
https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936411#comment-15936411
 ] 

Alexandre Rafalovitch commented on SOLR-7383:
---------------------------------------------

Varun, thank you for the comments.

bq. I'm curious as to why the core.properties file is empty in the tar that you 
uploaded. Even the existing rss example is has an empty core.properties . Maybe 
I am missing something here?

What would you expect in that file? The core name is by default the same as 
directory name. File is present, so Solr autodiscovers the core on startup, but 
there is no need for any extra configuration.

bq. I personally don't like the concept of these catch all fields. I understand 
that this is helpful as "/select" can then use "df=text" 

If we switch to eDisMax to search the original fields, then the string fields 
such as *author* will not be easily searchable and/or will require a secondary 
copy into a text field to be searched properly. As it is, one could facet on 
string field and search on catch-all text field. 

bq. I would change these three fieldTypes

I will look into that. I don't know much about points for now, so this is 
definitely a good suggestion to check.

bq. simplifying text_en_splitting

I did not want to create another type unless needed (that was my big problem 
with Tika example), so instead I have kept the protwords.txt and put 'lucene' 
in there. However, if other type is better I have no objections. 

bq. Do we need to strip out html ? When I see a sample summary on 
http://stackoverflow.com/feeds/tag/solr I see html chars in there.

The HTML is stripped by using two DIH transformers, so the text ends up without 
any HTML. There is also a new-style URP in solrconfig.xml to trim the post-DIH 
whitespace and - importantly in my opinion - to show that it is possible to 
have URPs with DIH. The stored summary field content at the end looks quite 
presentable. 



> DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo 
> possible
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-7383
>                 URL: https://issues.apache.org/jira/browse/SOLR-7383
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 5.0, 6.0
>            Reporter: Upayavira
>            Assignee: Alexandre Rafalovitch
>            Priority: Minor
>         Attachments: atom_20170315.tgz, rss-data-config.xml
>
>
> The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) 
> is broken again. See associated issues.
> Below is a config that should work.
> This is caused by Slashdot seemingly oscillating between RDF/RSS and pure 
> RSS. Perhaps we should depend upon something more static, rather than an 
> external service that is free to change as it desires.
> <dataConfig>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="slashdot"
>                 pk="link"
>                 url="http://rss.slashdot.org/Slashdot/slashdot";
>                 processor="XPathEntityProcessor"
>                 forEach="/RDF/item"
>                 transformer="DateFormatTransformer">
>                               
>             <field column="source" xpath="/RDF/channel/title" 
> commonField="true" />
>             <field column="source-link" xpath="/RDF/channel/link" 
> commonField="true" />
>             <field column="subject" xpath="/RDF/channel/subject" 
> commonField="true" />
>                       
>             <field column="title" xpath="/RDF/item/title" />
>             <field column="link" xpath="/RDF/item/link" />
>             <field column="description" xpath="/RDF/item/description" />
>             <field column="creator" xpath="/RDF/item/creator" />
>             <field column="item-subject" xpath="/RDF/item/subject" />
>             <field column="date" xpath="/RDF/item/date" 
> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
>             <field column="slash-department" xpath="/RDF/item/department" />
>             <field column="slash-section" xpath="/RDF/item/section" />
>             <field column="slash-comments" xpath="/RDF/item/comments" />
>         </entity>
>     </document>
> </dataConfig>



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to