Stephane Gamard created CONNECTORS-735:
------------------------------------------

             Summary: Include crawling date as metadata in OutputConnector
                 Key: CONNECTORS-735
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-735
             Project: ManifoldCF
          Issue Type: New Feature
          Components: Framework core
    Affects Versions: ManifoldCF 1.2
            Reporter: Stephane Gamard


While datum is a nightmare (not all connectors get their dates in the same 
manner, same way, etc etc etc) it might be interesting to leverage the crawling 
to date some volatile media (such as web). 

In case of webcrawling there are 3 dates that can certainly be inferred from 
the crawler's activity: 
- Date of page first appeared in queue (somewhat loosely equivalent to a 
created date)
- Date of last checked by the crawler (might not reflect a version update, 
content could still be exactly the same)
- Date of last update (since the URL exists in the queue, it might have changed 
over time and the crawler m ight know about this). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to