[PR] CSV-196-TrackBytePositions [commons-csv]

via GitHub Wed, 06 Nov 2024 17:47:33 -0800


DarrenJAN opened a new pull request, #502:
URL: https://github.com/apache/commons-csv/pull/502


   Add support in Commons CSV for tracking byte positions during parsing.
   
   Summary of Modifications
   
   1. **Test Data Files**: Added new test data files, and updated pom.xml to 
exclude these files from RAT checks, avoiding unapproved license checks.
   2. **CSVParser class**:
    Constructor Enhancements
   a. Added support for an optional parameter -- **String encoding**--, which 
specifies the encoding to use for the reader.
   4. **CSVRecord class**
   - private long characterByte:  start byte position of this record 
   Add new Constructor: support track byte positions in record class 
   5. **ExtendedBufferedReader Class**:
   
   - private long bytesRead: Tracks the number of bytes read so far.
   - private long bytesReadMark: Stores the marked byte position.
   - CharsetEncoder encoder: Encoder used to calculate byte size of characters.
   - getCharBytes(int current): This function calculates character bytes based 
on UTF-8 encoding. Note: it only supports UTF-8 due to the encoding algorithm 
used. Full encoding can be supported and we just need more effort on this. 
   - reset() and mark() Methods: Enhanced to prevent consuming characters and 
bytes unintentionally.
   
   Test result:
   mvn 
   <img width="1510" alt="image" 
src="https://github.com/user-attachments/assets/df0e8277-4a89-4aa0-b45a-c763eaa7e3e0";>
   <img width="1599" alt="image" 
src="https://github.com/user-attachments/assets/8fccade6-e530-4645-9a44-238fbcb8d826";>
   Pass unit tests and other restrictions 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] CSV-196-TrackBytePositions [commons-csv]

Reply via email to