Hi Alvaro,
sorry for the delay. Please write to the kim-discussion mailing
list to improve the response time.
0. In your scenario - you could do a register of the urls you've
crawled & processed and place it outside of KIM in the application
which invokes the server.
if you want to crawl / process the same url multiple times without
duplication you can extend the url registry you've created to be a map
containing an ID of the document.
1. You could use the URL as an ID for this and set it as a document
level feature when a new document is being processed (note that you
should first check and then create the document)
The check in this case can be done via getDocs or getDocsIds methods in
IndexAPI (or IndexAndPersistAPI). These take as parameter a query
string in the Lucene syntax ( i assume you are using this type of index
and persistence).
There you could specify that you are looking for documents containing
this URL in the field URL. URLs are tricky and i actually doubt that
this will work as the default interaction between KIM and Lucene is not
fully in your control, so you cannot choose how the field should be
analyzed/tokenized/indexed. So maybe this will fail.
2. another option is to use the persistenceID of the document. you can
obtain it with getLRPersistenceId() after you store it. put in the map
URL vs ID. later you try to load it from the PersistAPI (or
IndexAndPersistAPI).
If you manage to get it it means it exists. You do the processing
you want and then sync it back.
we hope this will work for you. let us know the results.
borislav
Alvaro Hernandez wrote:
This was my feedback, maybe you didn't receive it.
Thank you Borislav, I see your point.
My scenario is that I want to annotate a document.
I only have the url of the document. I don't know if it is in
the repository.
So, I do this:
KIMDocument kdoc = apiCorpora.createDocument(url,
null);
kdoc = apiSemAnn.execute(kdoc);
If I store the document:
apiIndexAndPersist.storeDoc(kdoc);
and the document is already in the repository, it would be twice
in it.
I don't what this.
If I do this:
apiIndexAndPersist.sync(kdoc);
KIM tells me that I have to store the document before.
I want to know if it is in the repository, but I only have its
URL,
How could I know, using the URL, if it is in the repository ??
Thank you again,
Alvaro
Hi Alvaro,
obviously there has been some type of miscommunication. i wrote to
you and as far as i remember my last message is the one included in
this thread.
I did not get any feedback from you on this. please read it again and
see if some of my suggestions are worth it.
borislav
Alvaro Hernandez wrote:
Any advice or idea, Borislav ?
Thank you,
hi Alvaro,
if your scenario is the first one, this means you either have the
document object still in use (i.e. done some kind of annotation and
want to do something more) in which case you can use the sync method if
you have stored the document.
The other scenario is that you have stored it before and you have
obtained it from the document store in order to apply some new
processing. Having obtained it, again you have the document and you
know it is stored.
I suspect that you are not obtaining the document from the document
store and this is why you are not sure i fit exists. In this case it is
always a new document, since the identification of the document in the
doc store (a persist id) is not present in this new one.
in order to get documents, one can use the QueryAPI - some of the
getDocuments methods.
borislav
Alvaro Hernandez wrote:
Thank you Borislav for your answer.
Mi scenario is the first one, but before I store a
document, I need to know if it is in the repository.
I have to use the store method to store it the first
time, and the sync method if the document already exist in the
repository. So in order to know what method I have to use, I have to
know if the document is in the repository.
My question is, how I could know if it is in the
repository, I mean, using what method ??
Thank you again,
Alvaro
Hi Alvaro,
indeed it is boring to have duplications :)
The sync method is the right one to use. But as you have seen it
requires a KIMDoc as a parameter.
The envisaged workflow is like this:
a new document comes to the system
you process it and store it
then if you like to re-process it, you find it in the repository,
reannotate something or change some doc-level features and then you
sync it.
If this is your scenario - then this is how it is supposed to work.
On the other hand, if you are passing the same corpus N times to the
server you would get N versions of each document in it.
Tell us your scenario and we will try to come up with a suggestion.
all the best to you
borislav
Alvaro Hernandez wrote:
Hi everybody, I'm trying to have only one annotation
about a document in the store.
I realized that when I store a document more than
once (using the storeDoc method), KIM stores it every time I use this
method.
However, I want only one occurrence of the document
in the store.
I found the sync method, but first I need to know if
the document was previously stored.
I didn`t found how to do that, using the API.
Could you please help me ??
Thank you,
Alvaro
Don't let your dream ride pass you by. Make it a reality with Yahoo! Autos.
_______________________________________________ NOTE: Please REPLY TO ALL to ensure that your reply reaches all members of this mailing list. KIM-discussion mailing list [email protected] http://ontotext.com/mailman/listinfo/kim-discussion_ontotext.com
No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.485 / Virus Database: 269.13.14 - Release Date: 9/10/2007 12:00 AM
Pinpoint customers who are looking for
what you sell.
Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.5.485 / Virus Database: 269.13.14 - Release Date: 9/10/2007 12:00 AM
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos
& more.
No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.488 / Virus Database: 269.13.30/1025 - Release Date: 9/23/2007 1:53 PM
Check
out the hottest 2008 models today at Yahoo! Autos.
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.488 / Virus Database: 269.13.30/1025 - Release Date: 9/23/2007 1:53 PM
|