Mario Juric created UIMA-4464:
---------------------------------
Summary: AE string configuration parameters are trimmed in the CPE
and when XML serializing
Key: UIMA-4464
URL: https://issues.apache.org/jira/browse/UIMA-4464
Project: UIMA
Issue Type: Bug
Reporter: Mario Juric
These are my findings so far:
Using the new gapText parameter in UIMA Ruta HTMLConverter I noticed that the
string is trimmed in the pipeline aggregation process e.g. “ . “ ends up as
“.” in the pipeline and when writing the pipeline to XML. I don’t think it has
anything to do with the HTMLConverter in particular. We use UIMAfit to
construct the aggregated analysis engine description but I don’t know where
this trimming exactly occurs. I was also able to run a small example pipeline
where the trim did not happen, which was a bit of a surprise.
The trimming is as such not a technical issue for me right now but I felt it
might become important in some other case. I just noticed it when I added
ekstra spaces to improve readability of my output. Initially I thought it was
the HTMLConverter but when I inspected it then I could see that it had happened
somewhere before configuration parameter initialisation.
I then inspected the UIMAfit generated descriptor right after creation. The
value was not trimmed at that point. Later during runtime initialisation
without doing any XML serialization this time, the value is trimmed inside
ConfigurationManagerImplBase::getConfigParameterValue right after the lookup
operation (used debugger for value inspection). This was inside a UIMA core
component though but the trim occurs somewhere between descriptor creation and
AE initialisation. Seems this is not an UIMAfit issue afterall.
I did a small example app where the HTMLAnnotator and HTMLConverter descriptors
were also aggregated before execution but here the trimming did not materialise
at runtime but only in the serialised XML. Then it occurred to me that my
example used the SimplePipeline whereas our main application uses CPE. I then
switched to the SimplePipeline and the trimming was now gone there as well.
Seems that trimming only happens inside the CPE and when XML serialising the
pipeline.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)