[
https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936411#comment-15936411
]
Alexandre Rafalovitch commented on SOLR-7383:
---------------------------------------------
Varun, thank you for the comments.
bq. I'm curious as to why the core.properties file is empty in the tar that you
uploaded. Even the existing rss example is has an empty core.properties . Maybe
I am missing something here?
What would you expect in that file? The core name is by default the same as
directory name. File is present, so Solr autodiscovers the core on startup, but
there is no need for any extra configuration.
bq. I personally don't like the concept of these catch all fields. I understand
that this is helpful as "/select" can then use "df=text"
If we switch to eDisMax to search the original fields, then the string fields
such as *author* will not be easily searchable and/or will require a secondary
copy into a text field to be searched properly. As it is, one could facet on
string field and search on catch-all text field.
bq. I would change these three fieldTypes
I will look into that. I don't know much about points for now, so this is
definitely a good suggestion to check.
bq. simplifying text_en_splitting
I did not want to create another type unless needed (that was my big problem
with Tika example), so instead I have kept the protwords.txt and put 'lucene'
in there. However, if other type is better I have no objections.
bq. Do we need to strip out html ? When I see a sample summary on
http://stackoverflow.com/feeds/tag/solr I see html chars in there.
The HTML is stripped by using two DIH transformers, so the text ends up without
any HTML. There is also a new-style URP in solrconfig.xml to trim the post-DIH
whitespace and - importantly in my opinion - to show that it is possible to
have URPs with DIH. The stored summary field content at the end looks quite
presentable.
> DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo
> possible
> --------------------------------------------------------------------------------
>
> Key: SOLR-7383
> URL: https://issues.apache.org/jira/browse/SOLR-7383
> Project: Solr
> Issue Type: Bug
> Components: contrib - DataImportHandler
> Affects Versions: 5.0, 6.0
> Reporter: Upayavira
> Assignee: Alexandre Rafalovitch
> Priority: Minor
> Attachments: atom_20170315.tgz, rss-data-config.xml
>
>
> The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml)
> is broken again. See associated issues.
> Below is a config that should work.
> This is caused by Slashdot seemingly oscillating between RDF/RSS and pure
> RSS. Perhaps we should depend upon something more static, rather than an
> external service that is free to change as it desires.
> <dataConfig>
> <dataSource type="URLDataSource" />
> <document>
> <entity name="slashdot"
> pk="link"
> url="http://rss.slashdot.org/Slashdot/slashdot"
> processor="XPathEntityProcessor"
> forEach="/RDF/item"
> transformer="DateFormatTransformer">
>
> <field column="source" xpath="/RDF/channel/title"
> commonField="true" />
> <field column="source-link" xpath="/RDF/channel/link"
> commonField="true" />
> <field column="subject" xpath="/RDF/channel/subject"
> commonField="true" />
>
> <field column="title" xpath="/RDF/item/title" />
> <field column="link" xpath="/RDF/item/link" />
> <field column="description" xpath="/RDF/item/description" />
> <field column="creator" xpath="/RDF/item/creator" />
> <field column="item-subject" xpath="/RDF/item/subject" />
> <field column="date" xpath="/RDF/item/date"
> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
> <field column="slash-department" xpath="/RDF/item/department" />
> <field column="slash-section" xpath="/RDF/item/section" />
> <field column="slash-comments" xpath="/RDF/item/comments" />
> </entity>
> </document>
> </dataConfig>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]