[
https://issues.apache.org/jira/browse/CONNECTORS-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078466#comment-13078466
]
Kate McGonigal edited comment on CONNECTORS-235 at 8/2/11 10:35 PM:
--------------------------------------------------------------------
I also tried setting "Dechromed Content" to "if present, in 'description'
field", but that just seems to hang the ingestion process at the beginning: the
job status gets to "Running", but it never finishes and nothing is ever sent to
Solr and the number of "Active" documents never decreases.
The log file shows:
Error tossed: java.lang.String cannot be cast to
org.apache.manifoldcf.core.interfaces.CharacterInput
java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.manifoldcf.core.interfaces.CharacterInput
at
org.apache.manifoldcf.crawler.jobs.Carrydown.getDataValuesAsFiles(Carrydown.java:595)
at
org.apache.manifoldcf.crawler.jobs.JobManager.retrieveParentDataAsFiles(JobManager.java:4274)
at
org.apache.manifoldcf.crawler.system.WorkerThread$VersionActivity.retrieveParentDataAsFiles(WorkerThread.java:1220)
at
org.apache.manifoldcf.crawler.connectors.rss.RSSConnector.getDocumentVersions(RSSConnector.java:827)
at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:342)
was (Author: kmcgonig):
I also tried setting "Dechromed Content" to "if present, in 'description'
field", but that just seems to hang the ingestion process at the beginning: the
job status gets to "Running", but it never finishes and nothing is ever sent to
Solr and the number of "Active" documents never decreases.
> item description element not indexed
> ------------------------------------
>
> Key: CONNECTORS-235
> URL: https://issues.apache.org/jira/browse/CONNECTORS-235
> Project: ManifoldCF
> Issue Type: Improvement
> Components: RSS connector
> Affects Versions: ManifoldCF 0.2
> Reporter: Kate McGonigal
>
> The RSS feed's *item* description is not written to any field in the Solr
> index.
> I have a typical RSS feed with the general structure:
> <rss>
> <channel>
> <title></title>
> <link></link>
> <description></description>
> <item>
> <title></title>
> <link></link>
> <pubDate></pubDate>
> <description> *** the description I do want *** </description>
> <author></author>
> <category></category>
> </item>
> </channel>
> </rss>
> Example:
> For the RSS feed:
> http://www.onemansjazz.ca/component/option,com_rss/feed,RSS2.0/no_html,1/
> the rss/channel/item/description field is not indexed into Solr.
> Example notes:
> - what does get written to the Solr "description" field is the description
> metadata from the website, i.e. "Jazz radio show from Winnipeg on CKUW 95.9
> FM, hosted by Maurice Hogue." in this case.
> - on the "Dechromed Content" tab of the job, "No dechromed content" is
> selected. I'm not sure if that is relevant.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira