Stephane Gamard created CONNECTORS-735:
------------------------------------------
Summary: Include crawling date as metadata in OutputConnector
Key: CONNECTORS-735
URL: https://issues.apache.org/jira/browse/CONNECTORS-735
Project: ManifoldCF
Issue Type: New Feature
Components: Framework core
Affects Versions: ManifoldCF 1.2
Reporter: Stephane Gamard
While datum is a nightmare (not all connectors get their dates in the same
manner, same way, etc etc etc) it might be interesting to leverage the crawling
to date some volatile media (such as web).
In case of webcrawling there are 3 dates that can certainly be inferred from
the crawler's activity:
- Date of page first appeared in queue (somewhat loosely equivalent to a
created date)
- Date of last checked by the crawler (might not reflect a version update,
content could still be exactly the same)
- Date of last update (since the URL exists in the queue, it might have changed
over time and the crawler m ight know about this).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira