DarrenJAN opened a new pull request, #502:
URL: https://github.com/apache/commons-csv/pull/502
Add support in Commons CSV for tracking byte positions during parsing.
Summary of Modifications
1. **Test Data Files**: Added new test data files, and updated pom.xml to
exclude these files from RAT checks, avoiding unapproved license checks.
2. **CSVParser class**:
Constructor Enhancements
a. Added support for an optional parameter -- **String encoding**--, which
specifies the encoding to use for the reader.
4. **CSVRecord class**
- private long characterByte: start byte position of this record
Add new Constructor: support track byte positions in record class
5. **ExtendedBufferedReader Class**:
- private long bytesRead: Tracks the number of bytes read so far.
- private long bytesReadMark: Stores the marked byte position.
- CharsetEncoder encoder: Encoder used to calculate byte size of characters.
- getCharBytes(int current): This function calculates character bytes based
on UTF-8 encoding. Note: it only supports UTF-8 due to the encoding algorithm
used. Full encoding can be supported and we just need more effort on this.
- reset() and mark() Methods: Enhanced to prevent consuming characters and
bytes unintentionally.
Test result:
mvn
<img width="1510" alt="image"
src="https://github.com/user-attachments/assets/df0e8277-4a89-4aa0-b45a-c763eaa7e3e0">
<img width="1599" alt="image"
src="https://github.com/user-attachments/assets/8fccade6-e530-4645-9a44-238fbcb8d826">
Pass unit tests and other restrictions
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]