Thanks Kostas. Can you upload somewhere and then point here, the message list strips attachments..
Cheers, Chris ------------------------ Chris Mattmann [email protected] -----Original Message----- From: Konstantinos Mavrommatis <[email protected]> Reply-To: <[email protected]> Date: Thursday, October 9, 2014 at 5:48 AM To: "[email protected]" <[email protected]> Subject: RE: How to ingest files when metadata contain non standard characters? >Thanks Chris, > >attached is an offending file before escape. >For the record perl module HTML::Entities does provide an escapeHTML >alternative that produces acceptable files. > >Thanks >K > > >> -----Original Message----- >> From: Chris Mattmann [mailto:[email protected]] >> Sent: Wednesday, October 08, 2014 11:38 AM >> To: [email protected] >> Subject: Re: How to ingest files when metadata contain non standard >> characters? >> >> cas-metadata should handle this escaping/unescaping in its SerDe >> capabilities. >> >> Kostsas, can yo provide the exact file that I can test on and upload it >> to JIRA? >> >> ------------------------ >> Chris Mattmann >> [email protected] >> >> >> >> >> -----Original Message----- >> From: Lewis John Mcgibbney <[email protected]> >> Reply-To: <[email protected]> >> Date: Thursday, October 9, 2014 at 2:59 AM >> To: "[email protected]" <[email protected]> >> Subject: Re: How to ingest files when metadata contain non standard >> characters? >> >> >Hi Kos, >> >Thanks for reply >> > >> >On Wed, Oct 8, 2014 at 5:16 PM, Konstantinos Mavrommatis < >> >[email protected]> wrote: >> > >> >> I escaped the characters using the CGI::escapeHTML function from the >> >> CGI perl module. >> >> >> > >> >Wow. I am surpised at this one. I wonder if this is a bug which >> results >> >in the discrepancy or if this is intential behaviour! >> > >> > >> >> >> >> The differences between the two versions (mine escaped vs yours >> >>escaped) is in the encoding of the single quote "'" character, if I >> >>am not mistaken. >> >> I want to clarify this because your email come as simple ASCII (not >> >>HTML) >> >> >> > >> >Yes that is correct. >> > >> > >> >> >> >> I did try your command and it worked !!! >> >> >> > >> >OK grand. >> > >> > >> >> >> >> Now the question is how to do this encoding (your version) ☺ >> >> >> >> >> >Is this the question? My thoughts would be that this should be >> >encapsulated within OODT somewhere and that it should not be necessary >> >to escape everything as you/we have been doing. This is extremely time >> >consuming and painful. >> > >> >I escaped everything here >> >http://www.freeformatter.com/html-escape.html >> > >> >and compared the strings here >> >http://text-compare.com/ >> > >> >The latter resource will verify that it is the single quote that is >> the >> >offending char here. >> >Thanks >> >Lewis >> > >********************************************************* >THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS >CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED >INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL >OR INDIVIDUALS NAMED ABOVE. >If the reader is not the intended recipient, or the >employee or agent responsible to deliver it to the >intended recipient, you are hereby notified that any >dissemination, distribution or copying of this >communication is strictly prohibited. If you have >received this communication in error, please reply to the >sender to notify us of the error and delete the original >message. Thank You.
