[ 
https://issues.apache.org/jira/browse/YETUS-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868604#comment-15868604
 ] 

Allen Wittenauer commented on YETUS-488:
----------------------------------------

The more I think about how we used to calculate the diffs, the more the 
nightmares start coming back. Haha.  On the surface, it looked totally viable.  
But once we got it into practice, the weirdness started to kick in, especially 
on large patches.  

"False positives" (as they were reported) were a huge problem.  They weren't 
really false though... Yetus was reporting the wrong one!

Old:
error1
error1
error1

New:
error1
error1
error1
error1

If you diff these, you end up with:

{code}
$ sdiff /tmp/old /tmp/new
error1                                                          error1
error1                                                          error1
error1                                                          error1
                                                              > error1
{code}

Users would come back and say that they didn't touch that line of code. But 
with no other context, the difference engine had no idea which error1 is the 
new one.  Adding line numbers almost always gave enough context:

{code}
$ sdiff /tmp/old /tmp/new
error1:1                                                      <
error1:6                                                        error1:6
                                                              > error1:7
error1:29                                                       error1:29
                                                              > error1:56
{code}

Now we at least know that the 2nd error and the 3rd error from the base code 
weren't changed in some way.

As I mentioned, calcdiffs was added way back when so that modules could add 
their own difference engines. For a great majority of tools, column_calcdiffs 
has just enough context to solve the repeating error issue really well. (it's 
not perfect, but...)  But what if line numbers move?  A lot of the modules use 
the GITDIFFxyz files to determine if a line moved.  I don't think checkstyle 
does, and that's probably what really needs to get fixed.

I think the basic problem is that checkstyle's output (again, IIRC) is 
extremely inconsistent about how it reports errors.  We might need to bite the 
bullet and add more context into checkstyle's default output so that we can 
apply our fairly large toolset to make it more complete about what has actually 
changed.

FWIW, to piggyback off of what [~sacharya] said, tools like checkstyle 
generally end up with teams making a choice when dealing with dirty code.   One 
school of thought is "you touched the code, you fix all the legacy problems 
with it too."  Another thought is "fix what you can, we're going to be lenient 
on it, but we still want it reported." Out of the box, Yetus implements #1 for 
everything.  But you can configure it for #2 by using the --tests-filter 
option.  Using that will set failed checkstyle's to -0 and give it a different 
color in the report (an orange).

> Checkstyle reports new error if the file still longer than expected
> -------------------------------------------------------------------
>
>                 Key: YETUS-488
>                 URL: https://issues.apache.org/jira/browse/YETUS-488
>             Project: Yetus
>          Issue Type: Improvement
>          Components: Test Patch
>    Affects Versions: 0.4.0
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Minor
>         Attachments: YETUS-488.00.patch
>
>
> Given a java source file which is originally longer than the checkstyle 
> defined value then we get the following warning every time the length changes 
> and we run the checkstyle plugin:
> {code}
> ./beeline/src/java/org/apache/hive/beeline/BeeLine.java:1: warning: File 
> length is 2,373 lines (max allowed is 2,000).
> {code}
> The problem is, that the checkstyle output is changed:
> {code:title=From}
> ./beeline/src/java/org/apache/hive/beeline/BeeLine.java:1: warning: File 
> length is 2,373 lines (max allowed is 2,000).
> {code}
> {code:title=To}
> ./beeline/src/java/org/apache/hive/beeline/BeeLine.java:1: warning: File 
> length is 2,365 lines (max allowed is 2,000).
> {code}
> Since the {{checkstyle_calcdiffs}} does not find the new line in the original 
> output, it marks it as a new error, and gives -1 to the result of the 
> checkstyle plugin.
> This is true for every error message where there is a number in the text. Not 
> exhaustive examples are below:
> - Line length
> - Row length
> - Function length
> - Number of attributes
> - Indentation
> I think we should be reluctant to give -1 without a reason so it would be 
> good to remove these errors.
> I have yet to find the ideal solution for the issue:
> - An easy patch could be to remove the numbers from files before the diff 
> (note: it does not effect the final diff-checkstyle-<MODULE>.txt output 
> messages). This would mean that the warning will give -1 only the first time 
> when the error text is occurred, and will not give -1 if the numbers are 
> changed, like
> -- File become shorter, but still longer than expected
> -- File become even longer
> -- Indentation changed but still problematic
> - A more sophisticated patch could do this only for specific errors, or even 
> check the value and give error when the line length grow.
> To be honest, I think the second solution would complicate the code too much, 
> and make it dependent on the specific checkstyle messages, so I do not like 
> it.
> I am even hesitant about the first one, because the change I introducing with 
> it might have unwanted consequences.
> Do anyone has better ideas? I would be happy to implement them, if someone 
> points me to the right direction :D
> Thanks,
> Peter



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to