[
https://issues.apache.org/jira/browse/IO-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150124#comment-13150124
]
Sebb commented on IO-288:
-------------------------
Good to know that it's easy to unambiguously detect CR and LF.
There seems to be a lot of spurious files in the zip archive.
I'm not sure that the getNewLineMatchByteCount() is as efficient as
BufferedReader.readLine() - it seems to process characters multiple times. It
could probably be improved by just checking current and previous chars. Also, I
don't think it's necessary to encode \n or \r - just use the appropriate
characters.
There are no tests for multi-block files where there may be lines spanning
blocks.
Indeed the CRLF pair may span blocks; I'm not convinced that the code handles
that correctly.
In order for getNewLineMatchByteCount() to detect all CRLF pairs, it generally
needs at least 2 characters to be present; this does not seem to be guaranteed.
Note: could use a smaller block size to make the test files smaller; probably
sensible to compare the results with a forward line reader. It would then be
simple to have a directory of various different test files - read the file
forward and store the lines; ensure that the reverse reader matches the
reversed lines.
The field totalBlockCount needs to be a long, not an int.
Might simplify the code to use empty arrays rather than null.
> Supply a ReversedLinesFileReader
> ---------------------------------
>
> Key: IO-288
> URL: https://issues.apache.org/jira/browse/IO-288
> Project: Commons IO
> Issue Type: New Feature
> Components: Utilities
> Reporter: Georg Henzler
> Fix For: 2.2
>
> Attachments: ReversedLinesFileReader0.2.zip
>
>
> I needed to analyse a log file today and I was looking for a
> ReversedLinesFileReader: A class that behaves exactly like BufferedReader
> except that it goes from bottom to top when readLine() is called. I didn't
> find it in IOUtils and the internet didn't help a lot either, e.g.
> http://www.java2s.com/Tutorial/Java/0180__File/ReversingaFile.htm is a fairly
> inefficient - the log files I'm analysing are huge and it is not a good idea
> to load the whole content in the memory.
> So I ended up writing an implementation myself using little memory and the
> class RandomAccessFile - see attached file. It's used as follows:
> int blockSize = 4096; // only that much memory is needed, no matter how big
> the file is
> ReversedLinesFileReader reversedLinesFileReader = new ReversedLinesFileReader
> (myFile, blockSize, "UTF-8"); // encoding is supported
> String line = null;
> while((line=reversedLinesFileReader.readLine())!=null) {
> ... // use the line
> if(enoughLinesSeen) {
> break;
> }
> }
> reversedLinesFileReader.close();
> I believe this could be useful for other people as well!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira