[
https://issues.apache.org/jira/browse/IO-670?focusedWorklogId=464108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-464108
]
ASF GitHub Bot logged work on IO-670:
-------------------------------------
Author: ASF GitHub Bot
Created on: 29/Jul/20 17:36
Start Date: 29/Jul/20 17:36
Worklog Time Spent: 10m
Work Description: sebbASF commented on a change in pull request #118:
URL: https://github.com/apache/commons-io/pull/118#discussion_r462250040
##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -882,15 +894,26 @@ public static boolean contentEqualsIgnoreEOL(final Reader
input1, final Reader i
nowPos2 += nowRead2;
}
}
+
if (readEnd1) {
+ // if input1 ended.
Review comment:
This comment is not necessary comment if readEnd1 is properly commented
##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -814,9 +814,21 @@ public static boolean contentEquals(final Reader input1,
final Reader input2)
}
private enum LastState {
- r,
- normal,
- newLine;
+ /**
+ * If last char is '\r'.
+ */
+ R,
+
+ /**
+ * If last char is not '\r' nor '\n'.
+ */
+ NORMAL,
+
+ /**
+ * If we just moved to a new line.
+ * It might sounds weird but after you see the codes you can know it.
+ */
+ NEW_LINE;
Review comment:
This could be NL to agree with CR above
##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -814,9 +814,21 @@ public static boolean contentEquals(final Reader input1,
final Reader input2)
}
private enum LastState {
- r,
- normal,
- newLine;
+ /**
+ * If last char is '\r'.
+ */
+ R,
+
+ /**
+ * If last char is not '\r' nor '\n'.
+ */
+ NORMAL,
+
+ /**
+ * If we just moved to a new line.
+ * It might sounds weird but after you see the codes you can know it.
Review comment:
This comment is not useful
##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -882,15 +894,26 @@ public static boolean contentEqualsIgnoreEOL(final Reader
input1, final Reader i
nowPos2 += nowRead2;
}
}
+
if (readEnd1) {
+ // if input1 ended.
if (readEnd2) {
+ // if both input1 and input2 ended here, we just return
true.
return true;
} else {
+ // if input2 not ended.
switch (lastState1) {
- case r:
- case newLine:
+ case R:
+ // if last state of input1 is "\r"
Review comment:
Unnecessary if the enum value has a sensible name and comment
##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -1147,31 +1201,43 @@ public static boolean contentEqualsIgnoreEOL(final
Reader input1, final Reader i
}
continue;
default:
- if (lastState1 == LastState.r) {
- lastState1 = LastState.newLine;
+ // if input2's next is normal.
+ // if input1's last is '\r', then it can become
"\r\n", then legal.
+ // otherwise illegal.
+ if (lastState1 == LastState.R) {
+ lastState1 = LastState.NEW_LINE;
nowCheck1++;
continue;
} else {
return false;
}
}
default:
+ // if input1's next is normal.
switch (charArray2[nowCheck2]) {
case '\n':
- if (lastState2 == LastState.r) {
- lastState2 = LastState.newLine;
+ // if input2's next is '\n'.
+ // if input2's last is '\r', then it can become
"\r\n", then legal.
+ // otherwise illegal.
+ if (lastState2 == LastState.R) {
+ lastState2 = LastState.NEW_LINE;
nowCheck2++;
continue;
} else {
return false;
}
case '\r':
+ // if input2's next is '\r'.
+ // illegal.
return false;
default:
+ // if input2's next is normal.
+ // if equal then legal.
+ // otherwise illegal.
if (charArray1[nowCheck1] !=
charArray2[nowCheck2]) {
return false;
}
- lastState1 = lastState2 = LastState.normal;
+ lastState1 = lastState2 = LastState.NORMAL;
nowCheck1++;
nowCheck2++;
continue;
Review comment:
Most of the comments above are unnecessary if the variables are well
named and commented.
What is missing is how the algorithm works, and what the various
combinations of state actually mean in terms of the algorithm.
I'm not clear why the code sometimes checks for '\r' and sometimes uses the
enum.
Why not use a variable containing the last character?
==
I suspect the code could be much simplified by using a suitable filter on
the inputs.
There are several examples in IO, and NET has FromNetASCIIInputStream which
deals with CRLF conversions
##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -882,15 +894,26 @@ public static boolean contentEqualsIgnoreEOL(final Reader
input1, final Reader i
nowPos2 += nowRead2;
}
}
+
Review comment:
To prevent accidental run-away, it would be safer to use >= to compare
incremented indexes
##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -814,9 +814,21 @@ public static boolean contentEquals(final Reader input1,
final Reader input2)
}
private enum LastState {
- r,
- normal,
- newLine;
+ /**
+ * If last char is '\r'.
+ */
+ R,
Review comment:
Why is this not called CR or CARRIAGE_RETURN?
##########
File path: src/main/java/org/apache/commons/io/IOUtils.java
##########
@@ -882,15 +894,26 @@ public static boolean contentEqualsIgnoreEOL(final Reader
input1, final Reader i
nowPos2 += nowRead2;
}
}
+
Review comment:
If the index starts at a value below the limit, and you increment it by
1 between each check, then I agree that the values must be equal at some point,
assuming single-threaded code.
However if there is a bug in the code that can cause the index to be
incremented twice, it may never match.
Comparison using >= avoids this potential error.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 464108)
Time Spent: 4h 20m (was: 4h 10m)
> IOUtils.contentEquals is of low performance. I will refine it.
> --------------------------------------------------------------
>
> Key: IO-670
> URL: https://issues.apache.org/jira/browse/IO-670
> Project: Commons IO
> Issue Type: Improvement
> Reporter: Jin Xu
> Priority: Critical
> Attachments: jmh-result.org.apache.json
>
> Time Spent: 4h 20m
> Remaining Estimate: 0h
>
> [https://github.com/apache/commons-io/pull/118]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)