Hi Alvaro,
    sorry for the delay. Please write to the kim-discussion mailing list to improve the response time.
0. In your scenario - you could do a register of the urls you've crawled & processed and place it outside of KIM in the application which invokes the server.

if you want to crawl / process the same url multiple times without duplication you can extend the url registry you've created to be a map containing an ID of the document.

1. You could use the URL as an ID for this and set it as a document level feature when a new document is being processed (note that you should first check and then create the document)
The check in this case can be done via getDocs or getDocsIds methods in IndexAPI (or IndexAndPersistAPI). These take as parameter a query string in the Lucene syntax ( i assume you are using this type of index and persistence).
There you could specify that you are looking for documents containing this URL in the field URL. URLs are tricky and i actually doubt that this will work as the default interaction between KIM and Lucene is not fully in your control, so you cannot choose how the field should be analyzed/tokenized/indexed. So maybe this will fail.

2. another option is to use the persistenceID of the document. you can obtain it with getLRPersistenceId() after you store it. put in the map URL vs ID. later you try to load it from the PersistAPI (or IndexAndPersistAPI).
   If you manage to get it it means it exists. You do the processing you want and then sync it back.

we hope this will work for you. let us know the results.
borislav
     

Alvaro Hernandez wrote:
This was my feedback, maybe you didn't receive it.
 
Thank you Borislav, I see your point.
 
My scenario is that I want to annotate a document.
 
I only have the url of the document. I don't know if it is in the repository.
 
So, I do this:
 
KIMDocument kdoc = apiCorpora.createDocument(url, null);
kdoc = apiSemAnn.execute(kdoc);
 
If I store the document:
apiIndexAndPersist.storeDoc(kdoc);
 
and the document is already in the repository, it would be twice in it.
I don't what this.
 
If I do this:
apiIndexAndPersist.sync(kdoc);
 
KIM tells me that I have to store the document before.
 
I want to know if it is in the repository, but I only have its URL,
How could I know, using the URL, if it is in the repository ??
 
Thank you again,
Alvaro


borislav popov <[EMAIL PROTECTED]> wrote:
Hi Alvaro,
    obviously there has been some type of miscommunication. i wrote to you and as far as i remember my last message is the one included in this thread.
I did not get any feedback from you on this. please read it again and see if some of my suggestions are worth it.
borislav

Alvaro Hernandez wrote:
Any advice or idea, Borislav ?
 
Thank you,
Alvaro


borislav popov <[EMAIL PROTECTED]> wrote:
hi Alvaro,
    if your scenario is the first one, this means you either have the document object still in use (i.e. done some kind of annotation and want to do something more) in which case you can use the sync method if you have stored the document.
The other scenario is that you have stored it before and you have obtained it from the document store in order to apply some new processing. Having obtained it, again you have the document and you know it is stored.
I suspect that you are not obtaining the document from the document store and this is why you are not sure i fit exists. In this case it is always a new document, since the identification of the document in the doc store (a persist id) is not present in this new one.
in order to get documents, one can use the QueryAPI - some of the getDocuments methods.
borislav

Alvaro Hernandez wrote:
Thank you Borislav for your answer.
 
Mi scenario is the first one, but before I store a document, I need to know if it is in the repository.
 
I have to use the store method to store it the first time, and the sync method if the document already exist in the repository. So in order to know what method I have to use, I have to know if the document is in the repository.
 
My question is, how I could know if it is in the repository, I mean, using what method ??
 
Thank you again,
Alvaro

borislav popov <[EMAIL PROTECTED]> wrote:
Hi Alvaro,
    indeed it is boring to have duplications :)
The sync method is the right one to use. But as you have seen it requires a KIMDoc as a parameter.
The envisaged workflow is like this:
    a new document comes to the system
    you process it and store it
    then if you like to re-process it, you find it in the repository, reannotate something or change some doc-level features and then you sync it.

If this is your scenario - then this is how it is supposed to work.
On the other hand, if you are passing the same corpus N times to the server you would get N versions of each document in it.

Tell us your scenario and we will try to come up with a suggestion.
all the best to you
borislav

Alvaro Hernandez wrote:
Hi everybody, I'm trying to have only one annotation about a document in the store.
 
I realized that when I store a document more than once (using the storeDoc method), KIM stores it every time I use this method.
 
However, I want only one occurrence of the document in the store.
 
I found the sync method, but first I need to know if the document was previously stored.
I didn`t found how to do that, using the API.
 
Could you please help me ??
 
Thank you,
Alvaro

Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos.

_______________________________________________ NOTE: Please REPLY TO ALL to ensure that your reply reaches all members of this mailing list. KIM-discussion mailing list [email protected] http://ontotext.com/mailman/listinfo/kim-discussion_ontotext.com

No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.485 / Virus Database: 269.13.14 - Release Date: 9/10/2007 12:00 AM


Pinpoint customers who are looking for what you sell.

Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.485 / Virus Database: 269.13.14 - Release Date: 9/10/2007 12:00 AM


Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more.

No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.488 / Virus Database: 269.13.30/1025 - Release Date: 9/23/2007 1:53 PM


Check out the hottest 2008 models today at Yahoo! Autos.

No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.488 / Virus Database: 269.13.30/1025 - Release Date: 9/23/2007 1:53 PM



_______________________________________________
NOTE: Please REPLY TO ALL to ensure that your reply reaches all members of this 
mailing list.

KIM-discussion mailing list
[email protected]
http://ontotext.com/mailman/listinfo/kim-discussion_ontotext.com

Reply via email to