[ 
https://issues.apache.org/jira/browse/SOLR-7383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935790#comment-15935790
 ] 

Varun Thacker commented on SOLR-7383:
-------------------------------------

Hi Alaxandre,

This is great! Thanks for taking this up!

I'm curious as to why the core.properties file is empty in the tar that you 
uploaded. Even the existing rss example is has an empty core.properties . Maybe 
I am missing something here?

I personally don't like the concept of these catch all fields. I understand 
that this is helpful as "/select" can then use "df=text" 
The alternate solution is : remove all the copy all fields and in the "/select" 
handler use edismax and define "qf" with the list of fields. I t personally 
would like this better but if you like the current solution then lets stick 
with that.


I would change these three fieldTypes 

{code}
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" 
positionIncrementGap="0"/>
    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" 
positionIncrementGap="0"/>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
{code}

- In the string fieldType we should add docValues
- For the int and tdate we can change them to using points 
- Maybe remove "*_tdt" or change it to points?

{code}
    <fieldType name="pint" class="solr.IntPointField" docValues="true"/>
    <fieldType name="pdate" class="solr.DatePointField" docValues="true"/>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" 
docValues="true"/>
{code}

Last thing I can think of is simplifying text_en_splitting
- Can we remove KeywordMarkerFilterFactory and thereby the protwords.txt 
- Also I'd imagine that porter stemmer in technical post summary search would 
be bad? 
- I haven't actually used the example so this might not apply . Do we need to 
strip out html ? When I see a sample summary on 
http://stackoverflow.com/feeds/tag/solr I see html chars in there.



> DIH: rewrite XPathEntityProcessor/RSS example as the smallest good demo 
> possible
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-7383
>                 URL: https://issues.apache.org/jira/browse/SOLR-7383
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 5.0, 6.0
>            Reporter: Upayavira
>            Assignee: Alexandre Rafalovitch
>            Priority: Minor
>         Attachments: atom_20170315.tgz, rss-data-config.xml
>
>
> The DIH example (solr/example/example-DIH/solr/rss/conf/rss-data-config.xml) 
> is broken again. See associated issues.
> Below is a config that should work.
> This is caused by Slashdot seemingly oscillating between RDF/RSS and pure 
> RSS. Perhaps we should depend upon something more static, rather than an 
> external service that is free to change as it desires.
> <dataConfig>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="slashdot"
>                 pk="link"
>                 url="http://rss.slashdot.org/Slashdot/slashdot";
>                 processor="XPathEntityProcessor"
>                 forEach="/RDF/item"
>                 transformer="DateFormatTransformer">
>                               
>             <field column="source" xpath="/RDF/channel/title" 
> commonField="true" />
>             <field column="source-link" xpath="/RDF/channel/link" 
> commonField="true" />
>             <field column="subject" xpath="/RDF/channel/subject" 
> commonField="true" />
>                       
>             <field column="title" xpath="/RDF/item/title" />
>             <field column="link" xpath="/RDF/item/link" />
>             <field column="description" xpath="/RDF/item/description" />
>             <field column="creator" xpath="/RDF/item/creator" />
>             <field column="item-subject" xpath="/RDF/item/subject" />
>             <field column="date" xpath="/RDF/item/date" 
> dateTimeFormat="yyyy-MM-dd'T'HH:mm:ss" />
>             <field column="slash-department" xpath="/RDF/item/department" />
>             <field column="slash-section" xpath="/RDF/item/section" />
>             <field column="slash-comments" xpath="/RDF/item/comments" />
>         </entity>
>     </document>
> </dataConfig>



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to