[ https://issues.apache.org/jira/browse/FLINK-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581431#comment-16581431 ]
ASF GitHub Bot commented on FLINK-9964: --------------------------------------- buptljy commented on issue #6541: [FLINK-9964] [table] Add a CSV table format factory URL: https://github.com/apache/flink/pull/6541#issuecomment-413285602 @twalthr I've replied a few coments above and optimize some codes according to your coments. I've finished: 1. Null value configuration. 2. Schema derivation. 3. some optimizations. About the encoding: The encoding for csv data can only be one of elements of com.fasterxml.jackson.core.JsonEncoding, and the jackson reader is able to automatically detect the encoding according to the rules of [rfc4627](http://www.ietf.org/rfc/rfc4627.txt). So we don't need to set the encoding mannually, and we can't allow users to use other encodings that JsonEncoding doesn't support, such as 'latin'. About the byte array: The byte array logic is weird because of the internal logic of the jackson that I explained in CsvRowSerializationSchema(line: 159). We regard the byte array as string to avoid unnecessary logic because jackson use base64 to deal with byte array(CsvGenerator: line 691), which means our users cannot give their original byte array, otherwise they cannot get original content after serializing or deserializing(see the codes below). Additionally, byte array is regarded binaryNode in jackson, so we cannot convert byte array like what we do with other array. ``` byte[] origin = "123".getBytes(); CsvSchema schema = CsvSchema.builder() .addColumn("a", STRING).build(); CsvMapper cm = new CsvMapper(); JsonNode result = cm.readerFor(JsonNode.class).with(schema).readValue(origin); byte[] transformed = result.binaryValue(); System.out.println(Arrays.equals(transformed, origin)); (expect true, actual false) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add a CSV table format factory > ------------------------------ > > Key: FLINK-9964 > URL: https://issues.apache.org/jira/browse/FLINK-9964 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL > Reporter: Timo Walther > Assignee: buptljy > Priority: Major > Labels: pull-request-available > > We should add a RFC 4180 compliant CSV table format factory to read and write > data into Kafka and other connectors. This requires a > {{SerializationSchemaFactory}} and {{DeserializationSchemaFactory}}. How we > want to represent all data types and nested types is still up for discussion. > For example, we could flatten and deflatten nested types as it is done > [here|http://support.gnip.com/articles/json2csv.html]. We can also have a > look how tools such as the Avro to CSV tool perform the conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)