[
https://issues.apache.org/jira/browse/SOLR-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson resolved SOLR-3976.
----------------------------------
Resolution: Not A Problem
Please raise this kind of issue on the user's list rather than a JIRA first in
case it has a simple resolution.
In this case, I'd use a copyField from text1 to text2 in your schema.xml.
> HTMLStripTransformer strips the "tika" field not the field to index -> cannot
> have both (stripped and unstripped)
> -----------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-3976
> URL: https://issues.apache.org/jira/browse/SOLR-3976
> Project: Solr
> Issue Type: Bug
> Components: contrib - DataImportHandler
> Affects Versions: 3.6
> Reporter: Markus Klose
> Priority: Minor
>
> I run into the situation to index an html file using the dataimport handler
> and got an unexpected output. I wanted to create one field with the original
> content and one field with the same content but without html markup.
> If I enaple the HTMLStripTransformer at field text2 the other one (text1) is
> striped as well
> example configuraion:
> <dataConfig>
> <dataSource type="BinFileDataSource" name="bin"/>
> <document>
> <entity name="f" processor="FileListEntityProcessor"
> recursive="true" rootEntity="false"
> dataSource="null" baseDir="...." fileName=".*.html"
> onError="skip" >
>
> <entity name="tika-test"
> processor="TikaEntityProcessor" url="${f.fileAbsolutePath}"
> format="html" dataSource="bin" onError="skip"
> transformer="HTMLStripTransformer,TemplateTransformer">
>
> <field column="id" template="${f.file}"/>
>
> <field column="text" name="text1" />
> <field column="text" name="text2"
> stripHTML="true"/>
> </entity>
> </entity>
> </document>
> </dataConfig>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]