[ 
https://issues.apache.org/jira/browse/CONNECTORS-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543973#comment-13543973
 ] 

David Morana commented on CONNECTORS-598:
-----------------------------------------

Well, we would like the bookmarks indexed even if it's an external link. So, 
external links just won't be crawled. Is this possible?
Here's an example of a bookmark:
In this case the href is http://www.fleetmon.com/products/services_data which 
is an external link.
So, can we just send the metadata (title, link, pubdate etc) to be indexed and 
not crawl this link? Basically, can we add a white list and a black list to 
the RSS connector? If it's not too much trouble...
What are your thoughts?
  <entry>
    <id>
    tag:dogear.ibm.com,2005:link:eab7ff92-e8e8-4770-9a97-f4e4e131f8ee</id>
    <title>FleetMon - Maritime traffic analyses, XML ship position
    data and API access for logistics, research and more -
    FleetMon.com</title>
    <category scheme="http://www.ibm.com/xmlns/prod/sn/type";
    term="bookmark" />
    <link href="http://www.fleetmon.com/products/services_data"; />
    <snx:link linkid="eab7ff92-e8e8-4770-9a97-f4e4e131f8ee" />
    <content type="html">
      <![CDATA[<div><p>

<br />

</p></div>]]>
</content>
    <published>2012-06-28T13:36:47-04:00</published>
    <updated>2012-06-28T13:36:47-04:00</updated>
    <category term="gulf_of_mexico" />
    <category term="ship_location_data" />
    <author>
      <email>[email protected]</email>
      <snx:userid>927E89B2-3BBA-4832-AFA4-23105CA7CC93</snx:userid>
      <snx:userState>active</snx:userState>
      <name>Menk, Robert (RO17354)</name>
      <uri>
      https://[...]/dogear/html?email=bmenk%40ll.mit.edu</uri>
    </author>
    <snx:clickcount>1</snx:clickcount>
    <snx:linkcount>1</snx:linkcount>
    <link rel="http://www.ibm.com/xmlns/prod/sn/same";
    type="application/atom+xml"
    
href="https://[...]/dogear/atom?for=http%3a%2f%2fwww.fleetmon.com%2fproducts%2fservices_data";
 
/>
  </entry>


                
> Add proxy pac files to the RSS connector
> ----------------------------------------
>
>                 Key: CONNECTORS-598
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-598
>             Project: ManifoldCF
>          Issue Type: Improvement
>    Affects Versions: ManifoldCF 1.0.1, ManifoldCF 1.1
>            Reporter: David Morana
>             Fix For: ManifoldCF next
>
>
> I have a public RSS feed on an intranet that lists important bookmarks. The 
> list has many external links in it. So ManifoldCF would need to know when to 
> use the company's proxy to index the external links.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to