Hi Jay, Keith, I really hope it is not too urgent. I will investigate the problem and will have an naswer for you of course but this will happen after the series of meeting we have this week is finished. I am sorry for the delay but I just can't spend time on this at the moment. However it is true that an exported document that was populated with the Population Tool has an empty OriginalMarkup annotation set. But the toXML() method exports the internal GATE doc which means this annotation set is empty in the GATE doc itself. So I suppose Jay's guess about the way a document is created is the important thing.
I will clarify this and will be back with an naswer asap. Ilian ----- Original Message ----- From: "Jay Johnston" <[EMAIL PROTECTED]> To: "Keith Suderman" <[EMAIL PROTECTED]> Cc: <[email protected]> Sent: 07 September 2006, Thursday 01:30 Subject: Re: [KIM-discussion] Formatting in Documents > Unfortunately a KIMDocument doesn't expose all the functions of the > underlying GATE Document, so while KIMDocument.toXML() is available > (apparently a thin wrapper around GATE's DocumentImpl.toXml()), > KIMDocument.toXML(Set annotationSet) isn't. To implement this using this > solution I would have to access the Lucene data store manually, > deserialize > the document, and use this method to reconstruct the original document. > Seems quite inconvenient and very resource-intensive for just getting the > original document. I'd rather just read the original document from the > filesystem (which is the solution I've implemented). If it was convenient > and quick for me to do it through KIM I would, but it's not. > > Further, I don't think the original markup is being retained by KIM. I > just > tried your code and it doesn't return the markup as expected. In fact, > when > you call toXML() on a KIMDocument, the "Original markups" tags are empty: > "<AnnotationSet Name="Original markups" ></AnnotationSet>" I'm guessing > setting gate.DocumentImpl.setPreserveOriginalContent() to false (as KIM > does) tells gate not to store this information (the Gate javadocs are a > bit > hazy on this). > > On 9/6/06, Keith Suderman < [EMAIL PROTECTED]> wrote: >> >> Hi Jay, >> >> What you need to do is call the document's toXml() method and pass in >> the AnnotationSet that contains the annotations you want to >> include. The original annotations will be in an annotation set named >> "Original markups" so you will need to use something like: >> >> gate.AnnotationSet aSet = document.getAnnotations("Original markups"); >> if (aSet != null) >> { >> String xml = document.toXml(aSet); >> ... >> } >> >> This won't reproduce the input document exactly as GATE will stick >> gateID attributes on each annotation. >> >> Keith >> >> At 03:26 AM 9/6/2006, borislav popov wrote: >> >Hi Jay, >> > There should be a way to preserve the original markup since we use >> >the GATE document model underneath. However this is true for some of the >> >methods for creation of KIM Document, and not all. We have to check >> >which method of creation is used in the population tool and determine >> >how the formatting can be preserved. >> >Please be patient if we do not answer today, because it is a national >> >holiday. >> >b >> > >> >Johnston wrote: >> > > When using the Population Tool and the KIM API, I see no way of >> > > returning a version of the stored document with original markup. For >> > > example, if the source documents are html or xml files, >> > > KIMDocument.getContent() returns a plaintext version of the document >> > > stripped of all tags. The KIMDocument.toXML() method returns an XML >> > > file tagged with annotations and features, but not the original >> > > markup. Is there some method I'm missing that will do this or do I >> > > need to implement this feature myself? >> > > >> > > Thanks, Jay >> > > >> > > >> ------------------------------------------------------------------------ >> > > >> > > _______________________________________________ >> > > NOTE: Please REPLY TO ALL to ensure that your reply reaches all >> > members of this mailing list. >> > > >> > > KIM-discussion mailing list >> > > [email protected] >> > > http://ontotext.com/mailman/listinfo/kim-discussion_ontotext.com >> > > >> > > >> ------------------------------------------------------------------------ >> > > >> > > No virus found in this incoming message. >> > > Checked by AVG Free Edition. >> > > Version: 7.1.405 / Virus Database: 268.11.7/436 - Release Date: >> 9/1/2006 >> > > >> > >> >_______________________________________________ >> >NOTE: Please REPLY TO ALL to ensure that your reply reaches all >> >members of this mailing list. >> > >> >KIM-discussion mailing list >> >[email protected] >> > http://ontotext.com/mailman/listinfo/kim-discussion_ontotext.com >> >> -------------------------------------------------- >> Research Associate >> American National Corpus >> [EMAIL PROTECTED] >> http://americannationalcorpus.org >> >> >> _______________________________________________ >> NOTE: Please REPLY TO ALL to ensure that your reply reaches all members >> of >> this mailing list. >> >> KIM-discussion mailing list >> [email protected] >> http://ontotext.com/mailman/listinfo/kim-discussion_ontotext.com >> > > > __________ NOD32 1.1742 (20060906) Information __________ > > This message was checked by NOD32 antivirus system. > http://www.eset.com > > -------------------------------------------------------------------------------- > _______________________________________________ > NOTE: Please REPLY TO ALL to ensure that your reply reaches all members of > this mailing list. > > KIM-discussion mailing list > [email protected] > http://ontotext.com/mailman/listinfo/kim-discussion_ontotext.com > > > __________ NOD32 1.1742 (20060906) Information __________ > > This message was checked by NOD32 antivirus system. > http://www.eset.com > > _______________________________________________ NOTE: Please REPLY TO ALL to ensure that your reply reaches all members of this mailing list. KIM-discussion mailing list [email protected] http://ontotext.com/mailman/listinfo/kim-discussion_ontotext.com
