Nice try Martin, but I'm not going to get sucked into a round of perl code golf with you. :-)

I wasn't sure why the absolute path stuff was in there; I just carried it over from Alexander's code. I'll let Alexander fix this along with your point about handling multiple files, if he wants to.

s'marks

P.S. Note that in the mail log,

http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-October/035538.html

the HTML entities in the s/a/b/g; expressions were evaluated, making the resulting script incorrect. Sigh.

On 9/30/15 11:54 PM, Martin Buchholz wrote:
Hi s'marks,
You probably don't need to absolutify paths.
And you can easily handle multiple args.

(just for fun!)
Checks for javadoc comment; handles popular html entities; handles multiple lines; handles both tt and code:

#!/bin/bash
find "$@" -name '*.java' | \
  xargs -r perl -p0777i -e \
'do {} while s~^ *\*.*\K<(tt|code)>((?:[^<>{}\&\@]|&(?:lt|gt|amp);)*)</\1>~$_=$2; s/&lt;/</g; s/&gt;/>/g; s/&amp;/&/g; "{\@code $_}"~mgie'


On Wed, Sep 30, 2015 at 6:16 PM, Stuart Marks <stuart.ma...@oracle.com <mailto:stuart.ma...@oracle.com>> wrote:

    Hi Alexander, Martin,

    The challenge of Perl file slurping and Emacs one-liners was too much to 
bear.

    This is Java, so one-liners are hardly possible. Still, there are a bunch
    of improvements that can be made to the Java version. (OK, and I'm showing
    off a bit.)

    Take a look at this:

    http://cr.openjdk.java.net/~smarks/misc/SimpleTagEditorSmarks1.java
    <http://cr.openjdk.java.net/%7Esmarks/misc/SimpleTagEditorSmarks1.java>

    I haven't studied the output exhaustively, but it seems to do a reasonably
    good job for the common cases. I ran it over java.lang and I noticed a few
    cases where there is markup embedded within <code></code> text, which
    should be looked at more closely.

    I don't particularly care if you use my version, but there are some
    techniques that I'd strongly recommend that you consider using in any such
    tool. In particular:

     - Pattern.DOTALL to do multi-line matches
     - Pattern.CASE_INSENSITIVE
     - try-with-resources to ensure that files are closed properly
     - NIO instead of old java.io <http://java.io> APIs, particularly
    Files.walk() and streams
     - use Scanner to deal with input file buffering
     - Scanner's stream support (I recently added this to JDK 9)

    Enjoy,

    s'marks



    On 9/29/15 2:23 PM, Martin Buchholz wrote:

        Hi Alexander,

        your change looks good.  It's OK to have manual corrections for 
automated
        mega-changes like this, as long as they all revert changes.

        Random comments:

        Should you publish your specdiff?  I guess not - it would be empty!

                     while((s = br.readLine()) != null) {

        by matching only one line at a time, you lose the ability to make
        replacements that span lines.  Perlers like to "slurp" in the entire 
file
        as a single string.

                 s = s.replace( "<CODE>", tag1);
                 s = s.replace( "<Code>", tag1);
                 s = s.replace("</CODE>", tag2);
                 s = s.replace("</Code>", tag2);

        Why not use case-insensitive regex?

        Here's an emacs-lisp one-liner I've been known to use:

        (defun tt-code ()
           (interactive)
           (query-replace-regexp "<\\(tt\\|code\\)>\\([^&<>\\\\]+\\)</\\1>"
        "{@code
        \\2}"))

        With more work, one can automate transformation of embedded things
        like &lt;

        But of course, it's not even possible to transform ALL uses of <code> to
        {@code, if there was imaginative use of nested html tags.


        On Tue, Sep 29, 2015 at 3:21 AM, Alexander Stepanov <
        alexander.v.stepa...@oracle.com
        <mailto:alexander.v.stepa...@oracle.com>> wrote:

            Updated: a few manual corrections were made (as @linkplain tags
            displays
            nested {@code } literally):
            http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch
            <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch>
            -checked with specdiff (which of course does not cover
            documentation for
            internal packages), no unexpected diffs detected.

            Regards,
            Alexander


            On 9/27/2015 4:52 PM, Alexander Stepanov wrote:

                Hello Martin,

                Here is some simple app. to replace <code></code> tags with a
                new-style
                {@code } one (which is definitely not so elegant as the Perl
                one-liners):
                
http://cr.openjdk.java.net/~avstepan/tmp/codeTags/SimpleTagEditor.java
                
<http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/SimpleTagEditor.java>

                Corresponding patch for jdk and replacement log (~62k of the
                tag changes):
                http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch
                <http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/jdk.patch>
                http://cr.openjdk.java.net/~avstepan/tmp/codeTags/replace.log
                
<http://cr.openjdk.java.net/%7Eavstepan/tmp/codeTags/replace.log>
                (sorry, I have to check the correctness of the patch with
                specdiff yet,
                so this is rather demo at the moment).

                Don't know if these changes (cosmetic by nature) are desired
                for now or
                not. Moreover, probably some part of them should go to another
                repos (e.g.,
                awt, swing -> "client" instead of "dev").

                Regards,
                Alexander



                ----- Исходное сообщение -----
                От: alexander.v.stepa...@oracle.com
                <mailto:alexander.v.stepa...@oracle.com>
                Кому: marti...@google.com <mailto:marti...@google.com>
                Копия: core-libs-dev@openjdk.java.net
                <mailto:core-libs-dev@openjdk.java.net>
                Отправленные: Четверг, 24 Сентябрь 2015 г 16:06:56 GMT +03:00
                Москва,
                Санкт-Петербург, Волгоград
                Тема: Re: RFR [9] 8133651: replace some <tt> tags (obsolete in
                html5) in
                core-libs docs

                Hello Martin,

                Thank you for review and for the notes!

                   > I'm biased of course, but I like the approach I took with
                blessed-modifier-order:
                   > - make the change completely automated
                   > - leave "human editing" for a separate change
                   > - publish the code used to make the automated change (in
                my case,
                typically a perl one-liner)

                Automated replacement has an obvious advantage: it is fast and
                massive.
                But there are some disadvantages at the same time (just IMHO).

                Using script it is quite easy to miss some not very trivial
                cases, e.g.:
                - remove unnecessary linebreaks, like
                    * <tt>someCode
                    * </tt>
                (which would be better to replace with single-line {@code
                someCode};
                - joining of successive terms, like "<tt>ONE</tt>, <tt>TWO</tt>,
                <tt>THREE</tt>" -> "{@code ONE, TWO, THREE}";
                - errors like extra or missing "&lt;" or "&gt;": * 
<tt>Collection
                &lt;T></tt>", - there were a lot of them;
                - some cases when <tt></tt> should be replaced with
                <code></code>, not
                {@code } (e.g. because of unicode characters inside of code 
etc.);
                - extra tags inside of <tt> or <code> which should be moved
                outside of
                {@code }, like <tt><i>someCode</i></tt> or
                <tt><b>someCode</b></tt>;
                - simple removing of needless tags, like "<tt>{@link ...}</tt>" 
->
                "{@link ...}";
                - replace HTML codes with symbols ('<', '>', '@', ...)
                - etc.
                - plus some other formatting changes and fixes for misprints
                which would
                be omitted during the automated replacement (and wouldn't be
                done in
                future manually because there is no motivation for repeated
                processing).

                So sometimes it may be difficult to say where is the border
                between
                "trivial" and "human-editing" cases (and the portion of
                "non-trivial
                cases" is definitely not minor); moreover, even the automated
                replacement requires the subsequent careful review before
                publishing of
                webrev (as well as by reviewers who probably wouldn't be happy
                to review
                hundreds of files at the same time) and iterative
                checks/corrections.
                specdiff is very useful for this task but also cannot fully
                cover the
                diffs (as some changes are situated in the internal com/...
                sun/...
                packages).

                Moreover, I'm sure that some reviewers would be annoyed with
                the fact
                that some (quite simple) changes were postponed because they
                are "not
                too trivial to be fixed just now" (because they will suspect
                they would
                be postponed forever). So the patch creator would (probably)
                receive
                some advices during the review like "please fix also fix this
                and that"
                (which is normal, of course).

                So my preference was to make the changes package by package
                (in some
                reasonable amount of files) not postponing part of the changes
                for the
                future (sorry for these boring repeating review requests).
                Please note
                that all the above mentioned is *rather explanation of my
                motivation
                than objection* :) (and of course I used some text editor 
replace
                automation which is surely not so advanced as Perl).

                   > It's probably correct, but I would have left it out of
                this change
                Yes, I see. Reverted (please update the web page):
                http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/index.html
                
<http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/index.html>

                Thanks,
                Alexander

                P.S. The <tt> replacement job is mostly (I guess, ~80%)
                complete. But
                probably this approach should be used if some similar
                replacement task
                for, e.g., <code></code> tags would be planned in future
                (there are
                thousands of them).


                On 9/24/2015 6:10 AM, Martin Buchholz wrote:


                    On Sat, Sep 19, 2015 at 6:58 AM, Alexander Stepanov
                    <alexander.v.stepa...@oracle.com
                    <mailto:alexander.v.stepa...@oracle.com>
                    <mailto:alexander.v.stepa...@oracle.com
                    <mailto:alexander.v.stepa...@oracle.com>>> wrote:

                          Hello,

                          Could you please review the following fix
                    http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/
                    <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
                          
<http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/>
                    
http://cr.openjdk.java.net/~avstepan/8133651/jaxws.00/index.html
                    
<http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html>
<http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html


                          for
                    https://bugs.openjdk.java.net/browse/JDK-8133651

                          Just another portion of deprecated <tt> (and <xmp>)
                    tags replaced
                          with {@code }. Some misprints were also fixed.


                    I'm biased of course, but I like the approach I took with
                    blessed-modifier-order:
                    - make the change completely automated
                    - leave "human editing" for a separate change
                    - publish the code used to make the automated change (in
                    my case,
                    typically a perl one-liner)


                          The following (expected) changes were detected by
                    specdiff:
                          - removed needless dashes in java.util.Locale,
                          - removed needless curly brace in
                    xml.bind.annotation.XmlElementRef


                    I would do a separate automated "removed needless dashes"
                    changeset.


                          Please let me know if the following changes are
                    desirable or not:

                    
http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html
                    
<http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html>
                          <
                    
http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html




                    This is an actual change to the behavior of this code - the
                    maintainers of jconsole need to approve it. It's probably
                    correct,
                    but I would have left it out of this change. If you remove
                    it, then I
                    approve this change.





Reply via email to