[ 
https://issues.apache.org/jira/browse/IO-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504882#comment-16504882
 ] 

Simon Spero commented on IO-577:
--------------------------------

A few comments :
1. apis like java.lang.stream and rxjava have filter methods that work in the 
opposite sense to the filter method introduced here - they select items that 
match the test, rather than excluding them. 

2. The documentation refers to "codepoints"; however, the read method in 
java.io.FilterReader returns UTF-16 characters. This makes a difference for 
characters that aren't in the BMP, and which are represented in Java as 
surrogate pairs. The current implementation can't filter codepoints like 😭 
(U+1F62D) because it only sees the UTF-16 surrogates.  
Working with codepoints would potentially require interposing a pushback reader 
to handle the case where the input contains a codepoint encoded in more than 
one char, which is not rejected. 

3. commons IO is currently using Java 7. If the source level were to change to 
Java 8 then the filter method could be replaced by an IntPredicate  / 
Predicate<Integer> (passed in when the class is constructed).  The current 
cases could be handled using a method reference. / Predicate.isEquals. 


> Add readers to filter out given characters: CharacterSetFilterReader and 
> CharacterFilterReader.
> -----------------------------------------------------------------------------------------------
>
>                 Key: IO-577
>                 URL: https://issues.apache.org/jira/browse/IO-577
>             Project: Commons IO
>          Issue Type: New Feature
>          Components: Filters
>            Reporter: Gary Gregory
>            Assignee: Gary Gregory
>            Priority: Major
>             Fix For: 2.7
>
>         Attachments: commons-io-577.patch
>
>
> Add readers to filter out given characters,  handy to remove known junk 
> characters from CSV files for example. Please see attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to