[
https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935790#comment-15935790
]
Varun Thacker commented on SOLR-7383:
-------------------------------------
Hi Alaxandre,
This is great! Thanks for taking this up!
I'm curious as to why the core.properties file is empty in the tar that you
uploaded. Even the existing rss example is has an empty core.properties . Maybe
I am missing something here?
I personally don't like the concept of these catch all fields. I understand
that this is helpful as "/select" can then use "df=text"
The alternate solution is : remove all the copy all fields and in the "/select"
handler use edismax and define "qf" with the list of fields. I t personally
would like this better but if you like the current solution then lets stick
with that.
I would change these three fieldTypes
{code}
<fieldType name="int" class="solr.TrieIntField" precisionStep="0"
positionIncrementGap="0"/>
<fieldType name="tdate" class="solr.TrieDateField" precisionStep="6"
positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
{code}
- In the string fieldType we should add docValues
- For the int and tdate we can change them to using points
- Maybe remove "*_tdt" or change it to points?
{code}
<fieldType name="pint" class="solr.IntPointField" docValues="true"/>
<fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"
docValues="true"/>
{code}
Last thing I can think of is simplifying text_en_splitting
- Can we remove KeywordMarkerFilterFactory and thereby the protwords.txt
- Also I'd imagine that porter stemmer in technical post summary search would
be bad?
- I haven't actually used the example so this might not apply . Do we need to
strip out html ? When I see a sample summary on
http://stackoverflow.com/feeds/tag/solr I see html chars in there.
> DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo
> possible
> --------------------------------------------------------------------------------
>
> Key: SOLR-7383
> URL: https://issues.apache.org/jira/browse/SOLR-7383
> Project: Solr
> Issue Type: Bug
> Components: contrib - DataImportHandler
> Affects Versions: 5.0, 6.0
> Reporter: Upayavira
> Assignee: Alexandre Rafalovitch
> Priority: Minor
> Attachments: atom_20170315.tgz, rss-data-config.xml
>
>
> The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml)
> is broken again. See associated issues.
> Below is a config that should work.
> This is caused by Slashdot seemingly oscillating between RDF/RSS and pure
> RSS. Perhaps we should depend upon something more static, rather than an
> external service that is free to change as it desires.
> <dataConfig>
> <dataSource type="URLDataSource" />
> <document>
> <entity name="slashdot"
> pk="link"
> url="http://rss.slashdot.org/Slashdot/slashdot"
> processor="XPathEntityProcessor"
> forEach="/RDF/item"
> transformer="DateFormatTransformer">
>
> <field column="source" xpath="/RDF/channel/title"
> commonField="true" />
> <field column="source-link" xpath="/RDF/channel/link"
> commonField="true" />
> <field column="subject" xpath="/RDF/channel/subject"
> commonField="true" />
>
> <field column="title" xpath="/RDF/item/title" />
> <field column="link" xpath="/RDF/item/link" />
> <field column="description" xpath="/RDF/item/description" />
> <field column="creator" xpath="/RDF/item/creator" />
> <field column="item-subject" xpath="/RDF/item/subject" />
> <field column="date" xpath="/RDF/item/date"
> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
> <field column="slash-department" xpath="/RDF/item/department" />
> <field column="slash-section" xpath="/RDF/item/section" />
> <field column="slash-comments" xpath="/RDF/item/comments" />
> </entity>
> </document>
> </dataConfig>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]