mmiklavc commented on issue #1341: METRON-614: Eliminate use of the default 
Charset
URL: https://github.com/apache/metron/pull/1341#issuecomment-510235543
 
 
   > > I think setting a default to UTF-8 in the parsers and documenting it 
would be the way to go. Provide a per-sensor config option, e.g. 
inputDataCharset
   > 
   > What is the status of this? I assume the suggestion here to provide 
additional configuration is a feature enhancement request and not something we 
would tackle on this PR. Is that right @mmiklavc?
   > 
   > What do we need to do to get this PR merged?
   
   I thought this could be handled pretty trivially here. What if we modified 
this ever so slightly to include a default charset for the parsers that can be 
overridden as needed? Here's an example.
   
   ```
   # MessageParser interface
   
   public interface MessageParser<T> extends Configurable {
   ...
       default Charset getReadCharset() {
           return StandardCharsets.UTF_8;
       }
   ...
   }
   
   // then we can do this for the default case
   
   # CSVParser
   
   public class CSVParser extends BasicParser {
   ...
     @Override
     public List<JSONObject> parse(byte[] rawMessage) {
       try {
         String msg = new String(rawMessage, getReadCharset());
   ...
   
   // And for implementations that might want something else
   
   public class CSVParser extends BasicParser {
   ...
     private Charset readCharset;
   
     @Override
     public void configure(Map<String, Object> parserConfig) {
       converter = new CSVConverter();
       converter.initialize(parserConfig);
       Object tsFormatObj = parserConfig.get(TIMESTAMP_FORMAT_CONF);
       if(tsFormatObj != null) {
         timestampFormat = new SimpleDateFormat(tsFormatObj.toString());
       }
       if (parserConfig.containsKey(READ_CHARSET)) {
         readCharset = Charset.forName((String) parserConfig.get(READ_CHARSET));
       }
     }
   
     @Override
     public Charset getReadCharset() {
       return readCharset != null ? readCharset : super.getReadCharset();
     }
   ...
   ```
   
   We could even push that config option into the BasicParser if we wanted. The 
upshot is that this seems like a really trivial change. Rather than opening 
another PR to address it, can we just add the option with our desired default 
now and be done with it?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to