[jira] Commented: (LANG-505) Rewrite StringEscapeUtils

Henri Yandell (JIRA) Sat, 06 Jun 2009 01:46:32 -0700

    [ 
https://issues.apache.org/jira/browse/LANG-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716850#action_12716850
 ]


Henri Yandell commented on LANG-505:
------------------------------------

Worked on this some more. The notion of passing in CharSequences works well, 
with a codepoint version to ease coding (and currently a char version but it 
may be pointless as codepoints feel like chars). I ignored the idea of having a 
query as to whether an escape will happen and then a second invocation to do 
the escape. It's not needed as yet.

escapeJava looks like:

{code:java}
    public static void escapeJava(String input, Writer out) throws IOException {
        AggregateEscaper escapers = new AggregateEscaper(
            new LookupTranslator( 
                      new String[][] {
                            {"\"", "\\\""},
                            {"\\", "\\\\"}
                      }),
            new EscapeLowAsciiAsUnicode(),
            new EscapeNonAsciiAsUnicode()
        );
          
        escape(input, escapers, out);
    }
{code}

ie) despite the API change, much the same as the above. I went ahead and 
stopped the Escaper extending FilterWriter so that's why the out variable stops 
getting repeated everywhere.

The core algorithm itself now looks like:

{code:java}
    public static void escape(CharSequence input, CharSequenceEscaper escaper, 
Writer out) throws IOException {
        if (out == null) {
            throw new IllegalArgumentException("The Writer must not be null");
        }
        if (input == null) {
            return;
        }
        if (escaper == null) {
            throw new IllegalArgumentException("The CharSequenceEscaper must 
not be null. " +
                                               "Use NullEscaper if you expected 
this to mean a no-operation");
        }
        int sz = Character.codePointCount(input, 0, input.length());
        for (int i = 0; i < sz; i++) {

            // consumed is the number of codepoints consumed
            int consumed = escaper.escape(input, i, out);

            if(consumed == 0) { 
                out.write( Character.toChars( Character.codePointAt(input, i) ) 
);
            } else {
                // contract with escapers is that they have to understand 
codepoints and they just took care of a surrogate pair
                for(int j=0; j<consumed; j++) {
                    if(i < sz - 2) {
                        // for loop will increment 1 anyway, so remove 1 from 
the end charCount
                        i += Character.charCount( Character.codePointAt(input, 
i) ) - 1;
                    }
                }
            }
        }
    }
{code}

This should be able to implement unescape as well as it is now a simple general 
purpose text translation API that can handle escaping n->m changes in 
characters. ie) a plugin can choose to consume 3 characters and turn them into 
5, and vice versa.

> Rewrite StringEscapeUtils
> -------------------------
>
>                 Key: LANG-505
>                 URL: https://issues.apache.org/jira/browse/LANG-505
>             Project: Commons Lang
>          Issue Type: Task
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>
> I think StringEscapeUtils needs a strong rewrite. For each escape method (and 
> unescape) there tend to be three or four types of escaping happening. So not 
> being able to define which set of three or four apply is a pain point (and 
> cause of bug reports due to different desired features).
> We should be offering basic functionality, but also allowing people to say 
> "escape(Escapers.BASIC_XML, Escapers.LOW_UNICODE, Escapers.HIGH_UNICODE)".
> Also should delete escapeSql; it's a bad one imo. Dangerous in that it will 
> lead people to not use PreparedStatement and given it only escapes ', it's 
> not much use. Especially as different dialects escape that in different ways.
> Opening this ticket for discussion. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LANG-505) Rewrite StringEscapeUtils

Reply via email to