buptljy commented on issue #6541: [FLINK-9964] [table] Add a CSV table format 
factory
URL: https://github.com/apache/flink/pull/6541#issuecomment-413285602
 
 
   @twalthr 
   I've replied a few coments above and optimize some codes according to your 
coments.
   I've finished:
   1. Null value configuration.
   2. Schema derivation.
   3. some optimizations.
   
   About the encoding: The encoding for csv data can only be one of elements of 
com.fasterxml.jackson.core.JsonEncoding, and the jackson reader is able to 
automatically detect the encoding according to the rules of 
[rfc4627](http://www.ietf.org/rfc/rfc4627.txt). So we don't need to set the 
encoding mannually, and we can't allow users to use other encodings that 
JsonEncoding doesn't support, such as 'latin'.
   
   About the byte array: The byte array logic is weird because of the internal 
logic of the jackson that I explained in CsvRowSerializationSchema(line: 159). 
We regard the byte array as string to avoid unnecessary logic because jackson 
use base64 to deal with byte array(CsvGenerator: line 691), which means our 
users cannot give their original byte array, otherwise they cannot get original 
content after serializing or deserializing(see the codes below). Additionally, 
byte array is regarded binaryNode in jackson, so we cannot convert byte array 
like what we do with other array. 
   
   ```
   byte[] origin = "123".getBytes();
   CsvSchema schema = CsvSchema.builder()
                .addColumn("a", STRING).build();
   CsvMapper cm = new CsvMapper();
   JsonNode result = 
cm.readerFor(JsonNode.class).with(schema).readValue(origin);
   byte[] transformed = result.binaryValue();
   System.out.println(Arrays.equals(transformed, origin)); (expect true, actual 
false)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to