eldenmoon opened a new issue, #16351:
URL: https://github.com/apache/doris/issues/16351

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Description
   
   # Background
   Dynamic schema table is a special type of table, it's schema change with 
loading procedure.Now we implemented this feature mainly for semi-structure 
data such as JSON, since JSON is schema self-described we could extract schema 
info from the original documents and inference the final type infomation.This 
speical table could reduce manual schema change operation and easily import 
semi-structure data and extends it's schema automatically. 
   
   # Design detail
   ## Type inference
   A special column is introduced to doris `ColumnObjet` it's represent a 
special dynamic column,  A column that represents object with dynamic set of 
subcolumns.Subcolumns are identified by paths in document and are stored in a 
trie-like structure. Subcolumns stores values in several parts of column and 
keeps current common type of all parts. We add a new column part with a new 
type, when we insert a field, which can't be converted to the current common 
type.After insertion of all values subcolumn should be finalized for writing 
and other operations.As batch of data imported, we could extract the common 
type of all types to a detail column type.
   
   For example bellow we have some documents as bellow:
   
![image](https://user-images.githubusercontent.com/64513324/216220779-1e75c64a-35c3-45d1-8fd6-8a262c48dd00.png)
   
   After the type inference, the trie-like structure will be like:
   
![image](https://user-images.githubusercontent.com/64513324/216220724-c7b26954-6a89-4e43-afaa-d1980a438982.png)
   
   ## Type conflict handling
   The rule is simple, like above metioned, the type evolution will follow the 
`Least Common Ancestors` rule.
   
![image](https://user-images.githubusercontent.com/64513324/216221354-0d9e9459-bc80-49cd-a80b-9df6168f6bf4.png)
   If no Ancestors could be found, then a type conflict is detected we have two 
method to handling such conflict:
   1. Abort this load, tell the user we are encountering type conflict bettween 
some types
   2. Cast all  conflict types to string
   Eg. if a path like `a.b.c` is a `bigint` 1234 type in doc1, but 
`array<bigint>` [1234] in doc2, for method 1 we abort this load, for method 2, 
we convert both `bigint` and `array<bigint>` to string `"1234"` , `"[1234]"`, 
so the final type of `a.b.c` is `string`
   
   ## Schema Change
   
   ## Storage Engine
   # Performance
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to