Hi Kate, I did two additional check-ins yesterday evening. Would you be so kind as to synch up from trunk and try again? I apologize for the confusion.
Karl On Wed, Aug 3, 2011 at 8:13 AM, Karl Wright <daddy...@gmail.com> wrote: >>>>>>> > I find it odd that I would be the first person to have this problem. > You'd think it would be very common. > <<<<<< > > Actually, I've not encountered this before even though the RSS > connector is one of the most widely used connectors. The only > situation this ever came up in before was when some MetaCarta clients > wanted to use the description field as primary content, which is why > it is an option for the "Dechromed Content" tab. But new feature > requests are always welcome. > > Also, as you might guess by the Derby and HSQLDB issue that you > encountered, most of our users use PostgreSQL. The Derby and HSQLDB > database support was added to simplify setup and allow tests to be > written that did not involve installing another package first. > However, each of these databases has known problems, some minor and > some more major. Thus you might want to consider going to PostgreSQL > in the future if you plan on doing any serious crawling. > > Thanks again! > Karl > > On Tue, Aug 2, 2011 at 2:56 PM, K McGonigal <kmcgon...@gmail.com> wrote: >> Hi Karl, >> >> Thank you for your quick response. I've opened a Jira ticket for this, >> though I don't really understand what sort of solution you had in mind so I >> didn't propose anything. >> >> I'm afraid I don't understand exactly what the Dechromed Content options do >> either. I read about them in the End User Documentation, but there wasn't >> much there yet. >> >> I find it odd that I would be the first person to have this problem. You'd >> think it would be very common. >> >> >> Kate >> >> >> On Tue, Aug 2, 2011 at 11:05 AM, Karl Wright <daddy...@gmail.com> wrote: >>> >>> I just looked at the code. It's not a bug rather than an oversight of >>> sorts. The "description" or "content" fields are indexed as the >>> primary content of the document if the "chrome" mode is selected >>> accordingly. If "None" is the "chrome" mode, then the item-level >>> description field is ignored even when present. >>> >>> So I recommend simply adding a new kind of "description" field for >>> when the "chrome" mode is set to "None". "item/description" may be >>> its name, or maybe the full XPath, your choice. Propose something in >>> the ticket and I'll respond. >>> >>> Thanks! >>> Karl >>> >>> >>> On Tue, Aug 2, 2011 at 11:47 AM, Karl Wright <daddy...@gmail.com> wrote: >>> > Hi Kate, >>> > >>> > The field mapping won't do the trick because the RSS connector is >>> > currently very selective about what fields it extracts - it by no >>> > means extracts all of them, so the ones that it *does* extract from >>> > the feed are "special". >>> > >>> > The behavior you describe sounds like a bug to me. I'll go spelunking >>> > through the code at first opportunity. In the meantime, could you >>> > create a Jira ticket describing the behavior you see vs. the behavior >>> > you want? >>> > >>> > Thanks! >>> > Karl >>> > >>> > On Tue, Aug 2, 2011 at 11:41 AM, K McGonigal <kmcgon...@gmail.com> >>> > wrote: >>> >> Hi, >>> >> >>> >> I'm trying to use ManifoldCF to index an RSS feed into Solr. It sort >>> >> of >>> >> works, but my main problem at the moment is that the *channel* >>> >> description >>> >> from the RSS feed is written to the "description" field in Solr when I >>> >> would >>> >> really like the *item* description to be written instead. >>> >> >>> >> I have a typical RSS feed with the general structure: >>> >> >>> >> <rss> >>> >> <channel> >>> >> <title></title> >>> >> <link></link> >>> >> <description> *** the description I don't want *** >>> >> </description> >>> >> <item> >>> >> <title></title> >>> >> <link></link> >>> >> <pubDate></pubDate> >>> >> <description> *** the description I do want *** >>> >> </description> >>> >> <author></author> >>> >> <category></category> >>> >> </item> >>> >> </channel> >>> >> </rss> >>> >> >>> >> I tried setting up the field mapping on the job with the XPath address >>> >> of >>> >> the second description, i.e. "/rss/channel/item/description" as the >>> >> source, >>> >> but that did not work. >>> >> >>> >> I suspect I'm overlooking something simple, but I've spent 2 days >>> >> trying to >>> >> solve it. I would be grateful for any help. >>> >> >>> >> >>> >> Kate McGonigal >>> >> >>> >> >>> >> >>> > >> >> >