Re: Clarification about string encoding in TsFile

Xiangdong Huang Mon, 11 May 2020 17:53:00 -0700

Hi Giorgio,

Thanks for reminding the details.


When we designed the TsFile, we wanted to use UTF-8.

I checked the source code, sometimes we clearly claim "utf-8" (e.g., device
name and measurement name) but something we do not (e.g., data point value).
Maybe we should claim the encoding all the time to avoid potential bugs.

Besides, as my computer is utf-8 by default, the String in Java also uses
utf-8 (and I think that is why there is no issue on my computer)

see the test:

```java
import java.io.UnsupportedEncodingException;
import org.junit.Assert;
import org.junit.Test;

public class StringTest {
  @Test
  public  void test() throws UnsupportedEncodingException {
    System.out.println(System.getProperty("file.encoding")); //will print
"UTF-8"
    String text = "Chinese中文";
    byte[] bytes1 = text.getBytes("UTF-8");
    byte[] bytes2 = text.getBytes();
    byte[] bytes3 = text.getBytes("UTF-16");
    Assert.assertArrayEquals(bytes2, bytes1); // true
    Assert.assertArrayEquals(bytes2, bytes3); // false
  }
}
```

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Giorgio Zoppi <[email protected]> 于2020年5月12日周二 上午8:05写道：

> Hello,
>
> thanks for the amazing work you are doing. I would like to know about
> string encoding, if it is UTF-8, ascii or UTF-16 in the TsFile.
>
> In Java is default is UTF-16, i just want confirmation.
>
> BR,
>
> Giorgio
>
>

Re: Clarification about string encoding in TsFile

Reply via email to