All fixes for the ticket are complete. Of course, in order to use them you will want to build and use trunk instead of the 0.2-incubating release. Let me know if this is a problem.
Thanks! Karl On Tue, Aug 2, 2011 at 3:04 PM, Karl Wright <daddy...@gmail.com> wrote: > Hi Kate, > > Many news RSS feeds put the full article in either the item > description or the item content field, while the document described by > the url field is not just straight content but contains navigation and > advertising "chrome". In such cases it's often preferable to generate > an index based on the description or content field contents rather > than the actual document with all of that chrome. The Dechromed > Content options allow you to set up that behavior for a specific job. > > Thanks for opening the ticket; I'll propose a solution shortly. > > Karl > > > On Tue, Aug 2, 2011 at 2:56 PM, K McGonigal <kmcgon...@gmail.com> wrote: >> Hi Karl, >> >> Thank you for your quick response. I've opened a Jira ticket for this, >> though I don't really understand what sort of solution you had in mind so I >> didn't propose anything. >> >> I'm afraid I don't understand exactly what the Dechromed Content options do >> either. I read about them in the End User Documentation, but there wasn't >> much there yet. >> >> I find it odd that I would be the first person to have this problem. You'd >> think it would be very common. >> >> >> Kate >> >> >> On Tue, Aug 2, 2011 at 11:05 AM, Karl Wright <daddy...@gmail.com> wrote: >>> >>> I just looked at the code. It's not a bug rather than an oversight of >>> sorts. The "description" or "content" fields are indexed as the >>> primary content of the document if the "chrome" mode is selected >>> accordingly. If "None" is the "chrome" mode, then the item-level >>> description field is ignored even when present. >>> >>> So I recommend simply adding a new kind of "description" field for >>> when the "chrome" mode is set to "None". "item/description" may be >>> its name, or maybe the full XPath, your choice. Propose something in >>> the ticket and I'll respond. >>> >>> Thanks! >>> Karl >>> >>> >>> On Tue, Aug 2, 2011 at 11:47 AM, Karl Wright <daddy...@gmail.com> wrote: >>> > Hi Kate, >>> > >>> > The field mapping won't do the trick because the RSS connector is >>> > currently very selective about what fields it extracts - it by no >>> > means extracts all of them, so the ones that it *does* extract from >>> > the feed are "special". >>> > >>> > The behavior you describe sounds like a bug to me. I'll go spelunking >>> > through the code at first opportunity. In the meantime, could you >>> > create a Jira ticket describing the behavior you see vs. the behavior >>> > you want? >>> > >>> > Thanks! >>> > Karl >>> > >>> > On Tue, Aug 2, 2011 at 11:41 AM, K McGonigal <kmcgon...@gmail.com> >>> > wrote: >>> >> Hi, >>> >> >>> >> I'm trying to use ManifoldCF to index an RSS feed into Solr. It sort >>> >> of >>> >> works, but my main problem at the moment is that the *channel* >>> >> description >>> >> from the RSS feed is written to the "description" field in Solr when I >>> >> would >>> >> really like the *item* description to be written instead. >>> >> >>> >> I have a typical RSS feed with the general structure: >>> >> >>> >> <rss> >>> >> <channel> >>> >> <title></title> >>> >> <link></link> >>> >> <description> *** the description I don't want *** >>> >> </description> >>> >> <item> >>> >> <title></title> >>> >> <link></link> >>> >> <pubDate></pubDate> >>> >> <description> *** the description I do want *** >>> >> </description> >>> >> <author></author> >>> >> <category></category> >>> >> </item> >>> >> </channel> >>> >> </rss> >>> >> >>> >> I tried setting up the field mapping on the job with the XPath address >>> >> of >>> >> the second description, i.e. "/rss/channel/item/description" as the >>> >> source, >>> >> but that did not work. >>> >> >>> >> I suspect I'm overlooking something simple, but I've spent 2 days >>> >> trying to >>> >> solve it. I would be grateful for any help. >>> >> >>> >> >>> >> Kate McGonigal >>> >> >>> >> >>> >> >>> > >> >> >