vagetablechicken commented on issue #5036: URL: https://github.com/apache/incubator-doris/issues/5036#issuecomment-755848621
I've found a similar error. The reason is: 1. be side: use the utf8 charset to encode https://github.com/apache/incubator-doris/blob/65d33cf43c837e56a2a36e78b358bfc0a9d1916b/be/src/util/arrow/row_batch.cpp#L80 1. spark-doris-connector side: use the default charset https://github.com/apache/incubator-doris/blob/65d33cf43c837e56a2a36e78b358bfc0a9d1916b/extension/spark-doris-connector/src/main/java/org/apache/doris/spark/serialization/RowBatch.java#L271 In my environment, the default charset is US-ASCII, so the Chinese characters become messy. It's better to specify charset `UTF_8` in `serialization/RowBatch`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
