eldenmoon opened a new issue, #16351: URL: https://github.com/apache/doris/issues/16351
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Description # Background Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically. # Design detail ## Type inference A special column is introduced to doris `ColumnObjet` it's represent a special dynamic column, A column that represents object with dynamic set of subcolumns.Subcolumns are identified by paths in document and are stored in a trie-like structure. Subcolumns stores values in several parts of column and keeps current common type of all parts. We add a new column part with a new type, when we insert a field, which can't be converted to the current common type.After insertion of all values subcolumn should be finalized for writing and other operations.As batch of data imported, we could extract the common type of all types to a detail column type. For example bellow we have some documents as bellow:  After the type inference, the trie-like structure will be like:  ## Type conflict handling The rule is simple, like above metioned, the type evolution will follow the `Least Common Ancestors` rule.  If no Ancestors could be found, then a type conflict is detected we have two method to handling such conflict: 1. Abort this load, tell the user we are encountering type conflict bettween some types 2. Cast all conflict types to string Eg. if a path like `a.b.c` is a `bigint` 1234 type in doc1, but `array<bigint>` [1234] in doc2, for method 1 we abort this load, for method 2, we convert both `bigint` and `array<bigint>` to string `"1234"` , `"[1234]"`, so the final type of `a.b.c` is `string` ## Schema Change ## Storage Engine # Performance ### Use case _No response_ ### Related issues _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
