Hi John,
I'll try to shed some light on your questions below, since I have been
more involved in the development of the RSS Feeder and its integration
with KIM. Please, find my comments between the lines.
One thing I have noticed is that the RSS Feeder creates and XML file
with a few more features than the Populater is set up to process by
default.
First of all, the Populater and the Feeder are two separate standalone
tools, capable of loading documents in KIM. They both use KIM's document
repository, semantic annotation and corpora APIs to import and annotate
documents in KIM. And this is where the similarities end, which means
that you can safely ignore the Populater configuration when working with
the Feeder and vice versa. It is useful, however, to have both in mind
when you use both tools to add documents. Actually, we've used the
Populater to import large sets of documents into a KIM server, which
were gathered by the RSS Feeder in the past.
Looks like the expected default document features are:
<TITLE><SUBTITLE> <AUTHORS><TIMESTAMP><SUBJECT><SOURCE> <URL><ORIGIN>
The RSS Feeder creates XML files for each content file that includes
the above features plus <FILE>, <MIMETYPE> and <CONTENT>.
I'm assuming that <FILE> and <MIMETYPE> are being overlooked by the
Populater, but I did want to know where in the RSS Feeder I could edit
which features get put into the XML files, so if I added additional
ones (such as <GUID>, <COPYRIGHT> or <LOCATION>) to be processed by
the Populater, I could also have the RSS Feeder generate them.
Currently, there is no way to add new features to the XML documents
created by the Feeder. The features you see are gathered from the RSS
Channel and the actual RSS item and are, unfortunately, hardcoded.
Still, the Populater has no role in what features get included in the
KIM document, it is KIM's feature schema which filters what gets in.
Please, have a look at the /Document features options/ section of this
page, which explains how to configure the feature schema -
https://confluence.ontotext.com/display/KimDocs37EN/Configuring+the+Document+Repository
Also, an additional question on a related subject. We are
experimenting with pulling in news content to KIM and I was wondering
if there is a set strategy for pulling in and processing
updated/corrected versions of the same news story (from the same news
source) . I believe there is a setting in populater.xml
(HUNT_DUPLICATES) that will ignore docs with the same <TITLE> to
prevent duplicates, but is there a more sophisticated way to configure
KIM so content with newer timestamp and same <GUID> replaces an
earlier version of the same content in KIM?
The short answer is: as of now - no.
Now the longer one. You should already guess that this parameter doesn't
help, because you're working with the Feeder. Updating existing
documents is a problem which could be broken into two parts:
1. The Feeder querying KIM for the existence of an updated document and
handling it as an updated doc rather than a new doc in terms of
KIM's APIs usage
2. The Document Repository in use on the KIM side should implement
document update, which means that it should update the actual
document representation as well as all the semantic annotations and
connections between resources (think of RDF).
The thing is, that the Lucene Document Repository implements document
updates, but there is no handling of the related RDF. Probably that's
why we have abandoned the update functionality in the Feeder, although
it is there and could be easily enabled in a following release.
Thanks for all you your helps,
John
Thanks for the good questions! Hope this helps!
Cheers,
Stefan
--
Stefan Enev<stefan.e...@ontotext.com>
Senior Software Engineer
Ontotext AD
On 5/7/13 7:52 AM, Philip Alexiev wrote:
Hi John,
Please provide some context. What features are you trying to add? What's the
purpose of those features. Are they specific to the different sources?
Phil
On May 2, 2013, at 6:26 PM, John Olszewski<jo...@53tech.com> wrote:
Hi Phil,
If I am adding a few new tags to my document features list, is there a way to
integrate those new tags into the RSS Feeder so they appear in the .xml files
of content coming in through RSS?
Let me know and thanks,
John
_______________________________________________
Kim-discussion mailing list
Kim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/kim-discussion