Hi John,

I'll try to shed some light on your questions below, since I have been more involved in the development of the RSS Feeder and its integration with KIM. Please, find my comments between the lines.

One thing I have noticed is that the RSS Feeder creates and XML file with a few more features than the Populater is set up to process by default.
First of all, the Populater and the Feeder are two separate standalone tools, capable of loading documents in KIM. They both use KIM's document repository, semantic annotation and corpora APIs to import and annotate documents in KIM. And this is where the similarities end, which means that you can safely ignore the Populater configuration when working with the Feeder and vice versa. It is useful, however, to have both in mind when you use both tools to add documents. Actually, we've used the Populater to import large sets of documents into a KIM server, which were gathered by the RSS Feeder in the past.

Looks like the expected default document features are:

<TITLE><SUBTITLE> <AUTHORS><TIMESTAMP><SUBJECT><SOURCE> <URL><ORIGIN>

The RSS Feeder creates XML files for each content file that includes the above features plus <FILE>, <MIMETYPE> and <CONTENT>.

I'm assuming that <FILE> and <MIMETYPE> are being overlooked by the Populater, but I did want to know where in the RSS Feeder I could edit which features get put into the XML files, so if I added additional ones (such as <GUID>, <COPYRIGHT> or <LOCATION>) to be processed by the Populater, I could also have the RSS Feeder generate them.
Currently, there is no way to add new features to the XML documents created by the Feeder. The features you see are gathered from the RSS Channel and the actual RSS item and are, unfortunately, hardcoded. Still, the Populater has no role in what features get included in the KIM document, it is KIM's feature schema which filters what gets in. Please, have a look at the /Document features options/ section of this page, which explains how to configure the feature schema - https://confluence.ontotext.com/display/KimDocs37EN/Configuring+the+Document+Repository

Also, an additional question on a related subject. We are experimenting with pulling in news content to KIM and I was wondering if there is a set strategy for pulling in and processing updated/corrected versions of the same news story (from the same news source) . I believe there is a setting in populater.xml (HUNT_DUPLICATES) that will ignore docs with the same <TITLE> to prevent duplicates, but is there a more sophisticated way to configure KIM so content with newer timestamp and same <GUID> replaces an earlier version of the same content in KIM?

The short answer is: as of now - no.

Now the longer one. You should already guess that this parameter doesn't help, because you're working with the Feeder. Updating existing documents is a problem which could be broken into two parts:

1. The Feeder querying KIM for the existence of an updated document and
   handling it as an updated doc rather than a new doc in terms of
   KIM's APIs usage
2. The Document Repository in use on the KIM side should implement
   document update, which means that it should update the actual
   document representation as well as all the semantic annotations and
   connections between resources (think of RDF).

The thing is, that the Lucene Document Repository implements document updates, but there is no handling of the related RDF. Probably that's why we have abandoned the update functionality in the Feeder, although it is there and could be easily enabled in a following release.

Thanks for all you your helps,

John
Thanks for the good questions! Hope this helps!

Cheers,
Stefan

--
Stefan Enev<stefan.e...@ontotext.com>
Senior Software Engineer
Ontotext AD




On 5/7/13 7:52 AM, Philip Alexiev wrote:
Hi John,

Please provide some context. What features are you trying to add? What's the 
purpose of those features. Are they specific to the different sources?

Phil

On May 2, 2013, at 6:26 PM, John Olszewski<jo...@53tech.com>  wrote:

Hi Phil,

If I am adding a few new tags to my document features list, is there a way to 
integrate those new tags into the RSS Feeder so they appear in the .xml files 
of content coming in through RSS?

Let me know and thanks,

John


_______________________________________________
Kim-discussion mailing list
Kim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/kim-discussion

Reply via email to