[
https://issues.apache.org/jira/browse/YETUS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870107#comment-15870107
]
Peter Vary commented on YETUS-488:
----------------------------------
Thanks guys for the input!
My impression was, that I was removing the numbers only from the error message
text.
Could you please help me where I miss something?
We get the checkstyle output, remove the unchanged files from it, and after
that add context to it by adding the code line to the message:
{code:title=checkstyle_runner}
# file:linenum:(column:)error ====>
# file:linenum:code(:column):error
pushd "${BASEDIR}" >/dev/null
while read -r logline; do
file=$(echo "${logline}" | cut -f1 -d:)
linenum=$(echo "${logline}" | cut -f2 -d:)
text=$(echo "${logline}" | cut -f3- -d:)
codeline=$(head -n "+${linenum}" "${file}" | tail -1 )
echo "${file}:${linenum}:${codeline}:${text}" >> "${output}"
done < <(cat "${tmp}.1")
{code}
I think this creates the following change:
{code:title=From}
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:1: warning: File length
is 2,372 lines (max allowed is 2,000).
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:109: warning: Comment
matches to-do format 'TODO:'.
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:124:39: warning: Name
'resourceBundle' must match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$'.
{code}
{code:title=To}
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:1:/**: warning: File
length is 2,372 lines (max allowed is 2,000).
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:109: * TODO:: warning:
Comment matches to-do format 'TODO:'.
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:124: private static
final ResourceBundle resourceBundle =:39: warning: Name 'resourceBundle' must
match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$'.
{code}
Ad when calculating the diff we remove the filename, and the line number from
the files with the following code:
{code:title=checkstyle_calcdiffs - First step}
# first, strip filenames:line:
# this keeps column: in an attempt to increase
# accuracy in case of multiple, repeated errors
# since the column number shouldn't change
# if the line of code hasn't been touched
cut -f3- -d: "${orig}" > "${tmp}.branch"
cut -f3- -d: "${new}" > "${tmp}.patch"
{code}
This changes the files like this:
{code:title=From}
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:1:/**: warning: File
length is 2,372 lines (max allowed is 2,000).
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:109: * TODO:: warning:
Comment matches to-do format 'TODO:'.
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:124: private static
final ResourceBundle resourceBundle =:39: warning: Name 'resourceBundle' must
match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$'.
{code}
{code:title=To}
/**: warning: File length is 2,372 lines (max allowed is 2,000).
* TODO:: warning: Comment matches to-do format 'TODO:'.
private static final ResourceBundle resourceBundle =:39: warning: Name
'resourceBundle' must match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$'.
{code}
Note, that after this change the line number is removed - only the position in
the file (line number) will correspond to the original checkstyle error.
And then calculate the changed line numbers with the following code:
{code:title=checkstyle_calcdiffs - Second step}
# compare the errors, generating a string of line
# numbers. Sorry portability: GNU diff makes this too easy
${DIFF} --unchanged-line-format="" \
--old-line-format="" \
--new-line-format="%dn " \
"${tmp}.branch" \
"${tmp}.patch" > "${tmp}.lined"
{code}
And get the original lines using the line numbers:
{code:title=checkstyle_calcdiffs - Third step}
# now, pull out those lines of the raw output
# shellcheck disable=SC2013
for j in $(cat "${tmp}.lined"); do
# shellcheck disable=SC2086
head -${j} "${new}" | tail -1
done
{code}
What I have proposed is removing the numbers from the stripped files like this:
{code:title=checkstyle_calcdiffs - First step - numbers removed}
cut -f3- -d: "${orig}" | sed -e "s/[0-9,]//g" > "${tmp}.branch" <-- sed
added here
cut -f3- -d: "${new}" | sed -e "s/[0-9,]//g" > "${tmp}.patch" <-- sed
added here
{code}
This changes the files like this:
{code:title=From}
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:1:/**: warning: File
length is 2,372 lines (max allowed is 2,000).
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:109: * TODO:: warning:
Comment matches to-do format 'TODO:'.
./beeline/src/java/org/apache/hive/beeline/BeeLine.java:124: private static
final ResourceBundle resourceBundle =:39: warning: Name 'resourceBundle' must
match pattern '^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$'.
{code}
{code:title=To}
/**: warning: File length is lines (max allowed is ).
* TODO:: warning: Comment matches to-do format 'TODO:'.
private static final ResourceBundle resourceBundle =:: warning: Name
'resourceBundle' must match pattern '^[A-Z][A-Z-]*(_[A-Z-]+)*$'.
{code}
This removes the numbers from the error messages, *and* the checkstyle column
number (which I was not aware of before your comments).
I think this patch will not mess with the original (and really complicated)
method of calculating the errors.
What do you think?
Thanks again to spending time to take a look at this stuff!
Peter
> Checkstyle reports new error if the file still longer than expected
> -------------------------------------------------------------------
>
> Key: YETUS-488
> URL: https://issues.apache.org/jira/browse/YETUS-488
> Project: Yetus
> Issue Type: Improvement
> Components: Test Patch
> Affects Versions: 0.4.0
> Reporter: Peter Vary
> Assignee: Peter Vary
> Priority: Minor
> Attachments: YETUS-488.00.patch
>
>
> Given a java source file which is originally longer than the checkstyle
> defined value then we get the following warning every time the length changes
> and we run the checkstyle plugin:
> {code}
> ./beeline/src/java/org/apache/hive/beeline/BeeLine.java:1: warning: File
> length is 2,373 lines (max allowed is 2,000).
> {code}
> The problem is, that the checkstyle output is changed:
> {code:title=From}
> ./beeline/src/java/org/apache/hive/beeline/BeeLine.java:1: warning: File
> length is 2,373 lines (max allowed is 2,000).
> {code}
> {code:title=To}
> ./beeline/src/java/org/apache/hive/beeline/BeeLine.java:1: warning: File
> length is 2,365 lines (max allowed is 2,000).
> {code}
> Since the {{checkstyle_calcdiffs}} does not find the new line in the original
> output, it marks it as a new error, and gives -1 to the result of the
> checkstyle plugin.
> This is true for every error message where there is a number in the text. Not
> exhaustive examples are below:
> - Line length
> - Row length
> - Function length
> - Number of attributes
> - Indentation
> I think we should be reluctant to give -1 without a reason so it would be
> good to remove these errors.
> I have yet to find the ideal solution for the issue:
> - An easy patch could be to remove the numbers from files before the diff
> (note: it does not effect the final diff-checkstyle-<MODULE>.txt output
> messages). This would mean that the warning will give -1 only the first time
> when the error text is occurred, and will not give -1 if the numbers are
> changed, like
> -- File become shorter, but still longer than expected
> -- File become even longer
> -- Indentation changed but still problematic
> - A more sophisticated patch could do this only for specific errors, or even
> check the value and give error when the line length grow.
> To be honest, I think the second solution would complicate the code too much,
> and make it dependent on the specific checkstyle messages, so I do not like
> it.
> I am even hesitant about the first one, because the change I introducing with
> it might have unwanted consequences.
> Do anyone has better ideas? I would be happy to implement them, if someone
> points me to the right direction :D
> Thanks,
> Peter
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)