[ 
https://issues.apache.org/jira/browse/SOLR-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888460#action_12888460
 ] 

Robert Muir commented on SOLR-2003:
-----------------------------------

bq. If there's a way to tell that the file is in the "wrong" encoding, then +1 
to throwing an exception

Well technically, its just the action of what to do for an exceptional case 
when decoding something malformed (e.g. illegal byte sequence).
The default action is to silently ignore, and substitute a replacement 
character (U+FFFD), but you can change this to throw an exception.

So we can't detect all cases, only ones that are "obviously" wrong and cause 
the decoder to get angry.


> report errors for wrongly-encoded files in ResourceLoader.getLines()
> --------------------------------------------------------------------
>
>                 Key: SOLR-2003
>                 URL: https://issues.apache.org/jira/browse/SOLR-2003
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: SOLR-2003.patch
>
>
> ResourceLoader is used to load things like stopwords and synonyms files, but 
> it uses the default 'Charset' argument for this.
> when you open an InputStream with a Charset, you get:
> {code}
> decoder = charset.newDecoder().onMalformedInput(
>     CodingErrorAction.REPLACE).onUnmappableCharacter(
>     CodingErrorAction.REPLACE);
> {code}
> For cases like malformed encoded stopwords and synonyms files, I think its 
> more helpful to use CodingErrorAction.REPORT than to silently replace with a 
> replacement char. Then the user gets an exception.
> See: 
> http://www.lucidimagination.com/search/document/1e50cb0992727fa1/foreign_characters_question

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to