Yes, proper escaping is an endless source of joy ;-) What HAPI version are
you using, btw?
I'll take a look at your code in the next few days if time permits.
When modifying existing functionality, we always face the problem of
backwards compatibility. So for the past one or two releases, we rather
added possibilities to plug in custom strategies of doing things while
keeping the default, rather than changing
existing behavior.
So far, Escape is unfortunately very static, but for 2.2 we can think about
making the escaping strategy pluggable just like other things in HAPI.
Thoughts?
cheers
Christian
2013/9/4 Ian Vowles <ian_vow...@health.qld.gov.au>
> I have sent mails to the general list about this issue before, and the
> advice has helped me progress.
>
> Then along comes another system that has slightly different behaviour.
>
> In this particular case a system correctly escapes the HL7 delimiters
> EXCEPT the escape delimiter. This allows it to send field content like
> this (from an address):
>
> 1 \ 24 Smith \T\ Wesson Road
>
> I was hopeful that since the single escape on it's own didn't form part of
> an escape sequence, that it might be preserved through the parse. This is
> not the case. The lone backslash
> is consumed in the process and disappears. I don't know how valid an
> argument it is to say it should be preserved, but if it isn't, I can't
> subsequently properly escape it to send to
> a downstream system.
>
> Given that I had been dealing with HL7 for some time before I found HAPI,
> I had done some work previously on an encode / unencode routine. My own
> code couldn't cope with this one
> either.
>
> I decided it was time to be brave, and dive into the HAPI code. Somewhere
> there had to be encode/unecode low level routines.
> Up until I looked in the source, I had been creating a new ST object, and
> using it's parse and encode methods. Once I looked into the source I found
> the Escape class.
>
> This updated version of Escape does the following:
>
> Preserves escape characters that do not form part of an escape sequence
> Permits the exceptional escape sequence case of \X000d\ to work when the
> escape character has been changed to something other than \
> Adds extra HEX escaped code \X0D\ and \X0A\ because we see them here
> occasionally.
>
> Test case code is also included at the bottom, including my now infamous
> "HATER" example :-). Test cases with lots of > < are there because we often
> do transforms between HL7 and XML, so we often look at these in additional
> test cases of the XML output produced.
>
> What are my chances of this being adopted?
>
> If not, how can I get my version to override the existing one?
>
> Thanks
> Ian
>
> ----------
> /**
> The contents of this file are subject to the Mozilla Public License
> Version 1.1
> (the "License"); you may not use this file except in compliance with the
> License.
> You may obtain a copy of the License at http://www.mozilla.org/MPL/
> Software distributed under the License is distributed on an "AS IS" basis,
> WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
> for the
> specific language governing rights and limitations under the License.
>
> The Original Code is "Escape.java". Description:
> "Handles "escaping" and "unescaping" of text according to the HL7 escape
> sequence rules
> defined in section 2.10 of the standard (version 2.4)"
>
> The Initial Developer of the Original Code is University Health Network.
> Copyright (C)
> 2001. All Rights Reserved.
>
> Contributor(s): Mark Lee (Skeva Technologies); Elmar Hinz
>
> Alternatively, the contents of this file may be used under the terms of
> the
> GNU General Public License (the ?GPL?), in which case the provisions of
> the GPL are
> applicable instead of those above. If you wish to allow use of your
> version of this
> file only under the terms of the GPL and not to allow others to use your
> version
> of this file under the MPL, indicate your decision by deleting the
> provisions above
> and replace them with the notice and other provisions required by the GPL
> License.
> If you do not delete the provisions above, a recipient may use your
> version of
> this file under either the MPL or the GPL.
> */
> package ca.uhn.hl7v2.parser;
>
> import java.util.Collections;
> import java.util.LinkedHashMap;
> import java.util.Map;
>
> /**
> * Handles "escaping" and "unescaping" of text according to the HL7 escape
> * sequence rules defined in section 2.10 of the standard (version 2.4).
> * Currently, escape sequences for multiple character sets are
> unsupported. The
> * highlighting and locally defined escape sequences are also
> * unsupported.
> * The only hexademical escapes supported are X000d, X0D, X0A
> *
> * @author Bryan Tripp
> * @author Mark Lee (Skeva Technologies)
> * @author Elmar Hinz
> * @author Christian Ohr
> */
> public class HL7Escape {
>
> /** Creates a new instance of Escape */
> public Hl7Escape() {
> }
>
> /**
> * @param text string to be escaped
> * @return the escaped string
> * <p>Defaults the escape characters to the conventional values |^~\&
> */
> public static String escape(String text) {
> return escape(text,"|^~\\&");
> }
>
> /**
> * @param text string to be escaped
> * @param encChars encoding characters to be used in the order
> * <br>Field, Component, Repetition, Escape, Sub-component
> * @return the escaped string
> */
> public static String escape(String text, String encChars) {
> EncLookup esc = getEscapeSequences(encChars);
> int textLength = text.length();
>
> StringBuilder result = new StringBuilder(textLength);
> for (int i = 0; i < textLength; i++) {
> boolean charReplaced = false;
> char c = text.charAt(i);
>
> FORENCCHARS:
> for (int j = 0; j < 6; j++) {
> if (text.charAt(i) == esc.characters[j]) {
>
> // Formatting escape sequences such as \.br\ should be left alone
> if (j == 4) {
>
> if (i+1 < textLength) {
>
> // Check for \.br\
> char nextChar = text.charAt(i + 1);
> switch (nextChar) {
> case '.':
> case 'C':
> case 'M':
> case 'X':
> case 'Z':
> {
> int nextEscapeIndex = text.indexOf(esc.characters[j], i + 1);
> if (nextEscapeIndex > 0) {
> result.append(text.substring(i, nextEscapeIndex + 1));
> charReplaced = true;
> i = nextEscapeIndex;
> break FORENCCHARS;
> }
> break;
> }
> case 'H':
> case 'N':
> {
> if (i+2 < textLength && text.charAt(i+2) == '\\') {
> int nextEscapeIndex = i + 2;
> if (nextEscapeIndex > 0) {
> result.append(text.substring(i, nextEscapeIndex + 1));
> charReplaced = true;
> i = nextEscapeIndex;
> break FORENCCHARS;
> }
> }
> break;
> }
> }
>
> }
>
> }
>
> result.append(esc.encodings[j]);
> charReplaced = true;
> break;
> }
> }
> if (!charReplaced) {
> result.append(c);
> }
> }
> return result.toString();
> }
>
> /**
> * @param text string to be unescaped
> * @return the unescaped string
> * <p>Defaults the escape characters to the conventional values |^~\&
> */
> public static String unescape(String text) {
> return unescape(text,"|^~\\&");
> }
>
> /**
> * @param text string to be unescaped
> * @param encChars encoding characters to be used in the order
> * <br>Field, Component, Repetition, Escape, Sub-component
> * @return the unescaped string
> */
> public static String unescape(String text, String encChars) {
>
> // If the escape char isn't found, we don't need to look for
> escape sequences
> char escapeChar = encChars.charAt(3);
> boolean foundEscapeChar = false;
> for (int i = 0; i < text.length(); i++) {
> if (text.charAt(i) == escapeChar) {
> foundEscapeChar = true;
> break;
> }
> }
> if (!foundEscapeChar) {
> return text;
> }
>
> int textLength = text.length();
> StringBuilder result = new StringBuilder(textLength + 20);
> EncLookup esc = getEscapeSequences(encChars);
> char escape = esc.characters[3];
> int encodingsCount = esc.characters.length;
> int i = 0;
> while (i < textLength) {
> char c = text.charAt(i);
> if (c != escape) {
> result.append(c);
> i++;
> } else {
> boolean foundEncoding = false;
>
> // Test against the standard encodings
> for (int j = 0; j < encodingsCount; j++) {
> String encoding = esc.encodings[j];
> int encodingLength = encoding.length();
> if ((i + encodingLength <= textLength) && text.substring(i, i +
> encodingLength)
> .equals(encoding)) {
> result.append(esc.characters[j]);
> i += encodingLength;
> foundEncoding = true;
> break;
> }
> }
>
> if (!foundEncoding) {
>
> // If we haven't found this, there is one more option. Escape
> sequences of /.XXXXX/ are
> // formatting codes. They should be left intact
> if (i + 1 < textLength) {
> char nextChar = text.charAt(i + 1);
> switch (nextChar) {
> case '.':
> case 'C':
> case 'M':
> case 'X':
> case 'Z':
> {
> int closingEscape = text.indexOf(escape, i + 1);
> if (closingEscape > 0) {
> String substring = text.substring(i, closingEscape + 1);
> result.append(substring);
> i += substring.length();
> } else {
> i++;
> }
> break;
> }
> case 'H':
> case 'N':
> {
> int closingEscape = text.indexOf(escape, i + 1);
> if (closingEscape == i + 2) {
> String substring = text.substring(i, closingEscape + 1);
> result.append(substring);
> i += substring.length();
> } else {
> i++;
> }
> break;
> }
> default:
> {
> //
> Preserve unescaped escape delimiter
>
> result.append(c);
> i++;
> }
> }
>
> } else {
> // Preserve unescaped
> escape delimiter
> result.append(c);
> i++;
> }
> }
>
>
> }
> }
> return result.toString();
> }
>
> /**
> * Returns a HashTable with escape sequences as keys, and corresponding
> * Strings as values.
> * @param encChars
> * @return
> */
> private static EncLookup getEscapeSequences(String encChars) {
> EncLookup escapeSequences = new EncLookup(encChars);
> return escapeSequences;
> }
>
>
>
>
> /**
> * A performance-optimized replacement for using when
> * mapping from HL7 special characters to their respective
> * encodings
> *
> * @author Christian Ohr
> */
> private static class EncLookup {
>
> char[] characters = new char[8];
> String[] encodings = new String[8];
>
> EncLookup(String ec) {
> characters[0] = ec.charAt(0);
> characters[1] = ec.charAt(1);
> characters[2] = ec.charAt(2);
> characters[3] = ec.charAt(3);
> characters[4] = ec.charAt(4);
> characters[5] = '\r';
> characters[6] = '\r';
> characters[7] = '\n';
> char escapeChar = ec.charAt(3);
> char[] codes = {'F', 'S', 'R', 'E', 'T'};
> for (int i = 0; i < codes.length; i++) {
> StringBuilder seq = new StringBuilder();
> seq.append(escapeChar);
> seq.append(codes[i]);
> seq.append(escapeChar);
> encodings[i] = seq.toString();
> }
> // encodings[5] = "\\X000d\\";
> encodings[5] = escapeChar + "X000d" + escapeChar;
> encodings[6] = escapeChar + "X0D" + escapeChar;
> encodings[7] = escapeChar + "X0A" + escapeChar;
> }
> }
> }
>
> -----
>
> Test case:
>
> /*
> * To change this template, choose Tools | Templates
> * and open the template in the editor.
> */
> package ca.uhn.hl7v2.parser;
>
> import org.junit.After;
> import org.junit.AfterClass;
> import org.junit.Before;
> import org.junit.BeforeClass;
> import org.junit.Test;
> import static org.junit.Assert.*;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
>
> /**
> *
> * @author vowlesi
> */
> public class SingleBackslashV3Test {
>
> private static final Logger log =
> LoggerFactory.getLogger(EscapeV2Test.class);
> private String encChars = "|^~\\&";
>
> public SingleBackslashV3Test() {
> }
>
> @BeforeClass
> public static void setUpClass() {
> }
>
> @AfterClass
> public static void tearDownClass() {
> }
>
> @Before
> public void setUp() {
> }
>
> @After
> public void tearDown() {
> }
>
> /**
> * Test of unescape method, of class Escape.
> */
> @Test
> public void testUnescapeSingleBackslash() {
> log.debug("unescape with single backslash");
> String text = "1 \\ 24 Smith \\T\\ Wesson Road";
> String expResult = "1 \\ 24 Smith & Wesson Road";
> String result = Hl7Escape.unescape(text);
> log.debug("Input : " + text);
> log.debug("Result : " + result);
> log.debug("Expected : " + expResult);
> assertEquals(expResult, result);
> text = "\\H\\A\\T\\E\\R\\\\N\\<<\\S\\>>\"\\E\\''\\F\\Special test
> '\\XFFFFFFFFFFFFFFFFFFFF\\'";
> expResult = "\\H\\A&E~\\N\\<<^>>\"\\''|Special test
> '\\XFFFFFFFFFFFFFFFFFFFF\\'";
> result = Hl7Escape.unescape(text);
> log.debug("Input : " + text);
> log.debug("Result : " + result);
> log.debug("Expected : " + expResult);
> assertEquals(expResult, result);
> text = "\\H\\A\\T\\E\\R\\\\N\\<<\\S\\>>\"\\E\\''\\F\\Special test
> '\\X000d\\'";
> expResult = "\\H\\A&E~\\N\\<<^>>\"\\''|Special test '\r\'";
> result = Hl7Escape.unescape(text);
> log.debug("Input : " + text);
> log.debug("Result : " + result);
> log.debug("Expected : " + expResult);
> assertEquals(expResult, result);
> text = "\\\\\\\\\\\\\\\\\\\\";
> expResult = "\\\\\\\\\\\\\\\\\\\\";
> result = Hl7Escape.unescape(text);
> log.debug("Input : " + text);
> log.debug("Result : " + result);
> log.debug("Expected : " + expResult);
> assertEquals(expResult, result);
> text = "Ken\\n\\F\\edy";
> expResult = "Ken\\E\\n\\F\\edy";
> result = Hl7Escape.unescape(text);
> result = Hl7Escape.escape(result);
> log.debug("Input : " + text);
> log.debug("Result : " + result);
> log.debug("Expected : " + expResult);
> assertEquals(expResult, result);
> }
> }
>
>
>
> ********************************************************************************
>
> This email, including any attachments sent with it, is confidential and
> for the sole use of the intended recipient(s). This confidentiality is not
> waived or lost, if you receive it and you are not the intended
> recipient(s), or if it is transmitted/received in error.
>
> Any unauthorised use, alteration, disclosure, distribution or review of
> this email is strictly prohibited. The information contained in this email,
> including any attachment sent with it, may be subject to a statutory duty
> of confidentiality if it relates to health service matters.
>
> If you are not the intended recipient(s), or if you have received this
> email in error, you are asked to immediately notify the sender by telephone
> collect on Australia +61 1800 198 175 or by return email. You should also
> delete this email, and any copies, from your computer system network and
> destroy any hard copies produced.
>
> If not an intended recipient of this email, you must not copy, distribute
> or take any action(s) that relies on it; any form of disclosure,
> modification, distribution and/or publication of this email is also
> prohibited.
>
> Although Queensland Health takes all reasonable steps to ensure this email
> does not contain malicious software, Queensland Health does not accept
> responsibility for the consequences if any person's computer inadvertently
> suffers any disruption to services, loss of information, harm or is
> infected with a virus, other malicious computer programme or code that may
> occur as a consequence of receiving this email.
>
> Unless stated otherwise, this email represents only the views of the
> sender and not the views of the Queensland Government.
>
>
> **********************************************************************************
>
>
>
>
> ------------------------------------------------------------------------------
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
> _______________________________________________
> Hl7api-devel mailing list
> Hl7api-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/hl7api-devel
>
>
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Hl7api-devel mailing list
Hl7api-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/hl7api-devel