Hi All
I want to submit a proposal to support larger string types.
Background

There are currently two types of strings: CHAR and VARCHAR. Char stores
fixed-length strings and VARCHAR stores variable-length strings. The
maximum length of VARCHAR is 65533. This length can meet most demand
scenarios, but for some scenarios. In the scenario of storing larger
strings in doris, it is not enough, so we need to add a new data type
String. String can correspond to blob or text storage in mysql. The maximum
length is 4GB, but we still don't recommend it. Store more than 64K strings
in DORIS
Other system implementation

   -

   MYSQL: Mysql uses blob or TEXT as the storage type for very long
   strings. MySQL can perform string operations on these types, but
   performance is not guaranteed. In actual storage, the data will be stored
   in the overflow page. And according to the version and storage engine in
   the data page, the first n characters will be stored for indexing
   -

   parquet/ORC: These two pairs and large strings are directly stored in
   the data area, and there is no special processing and only dictionary
   encoding

Design

   -

   Added the String type, which represents a string of any length. In order
   to be compatible with mysql, the maximum length is set to 4G-4, and 4 bytes
   are used to store the length of the string
   -

   The data storage is similar to the varchar type, the previous length
   identifier is changed to 4 bytes
   -

   Indexes are not currently supported, and zonemap indexes will be enabled
   after the zonemap length limit is ready.

Reply via email to