Thanks Chris,

attached is an offending file before escape.
For the record perl module HTML::Entities does provide an escapeHTML 
alternative that produces acceptable files.

Thanks
K


> -----Original Message-----
> From: Chris Mattmann [mailto:[email protected]]
> Sent: Wednesday, October 08, 2014 11:38 AM
> To: [email protected]
> Subject: Re: How to ingest files when metadata contain non standard
> characters?
> 
> cas-metadata should handle this escaping/unescaping in its SerDe
> capabilities.
> 
> Kostsas, can yo provide the exact file that I can test on and upload it
> to JIRA?
> 
> ------------------------
> Chris Mattmann
> [email protected]
> 
> 
> 
> 
> -----Original Message-----
> From: Lewis John Mcgibbney <[email protected]>
> Reply-To: <[email protected]>
> Date: Thursday, October 9, 2014 at 2:59 AM
> To: "[email protected]" <[email protected]>
> Subject: Re: How to ingest files when metadata contain non standard
> characters?
> 
> >Hi Kos,
> >Thanks for reply
> >
> >On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis <
> >[email protected]> wrote:
> >
> >> I escaped the characters using the CGI::escapeHTML function from the
> >> CGI perl module.
> >>
> >
> >Wow. I am surpised at this one. I wonder if this is a bug which
> results
> >in the discrepancy or if this is intential behaviour!
> >
> >
> >>
> >> The differences between the two versions (mine escaped vs yours
> >>escaped)  is in the encoding of the single quote "'" character, if I
> >>am not mistaken.
> >> I want to clarify this because your email come as simple ASCII (not
> >>HTML)
> >>
> >
> >Yes that is correct.
> >
> >
> >>
> >> I did try your command and it worked !!!
> >>
> >
> >OK grand.
> >
> >
> >>
> >> Now the question is how to do this encoding (your version) ☺
> >>
> >>
> >Is this the question? My thoughts would be that this should be
> >encapsulated within OODT somewhere and that it should not be necessary
> >to escape everything as you/we have been doing. This is extremely time
> >consuming and painful.
> >
> >I escaped everything here
> >http://www.freeformatter.com/html-escape.html
> >
> >and compared the strings here
> >http://text-compare.com/
> >
> >The latter resource will verify that it is the single quote that is
> the
> >offending char here.
> >Thanks
> >Lewis
> 


*********************************************************
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE.
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.

Reply via email to