luofeng1994 opened a new issue, #5098:
URL: https://github.com/apache/gravitino/issues/5098

   ### Version
   
   0.6.0
   
   ### Describe what's wrong
   
   When parsing metadata from a Doris database, fields with the data type 
array<varchar()> are incorrectly parsed as varchar() instead of retaining the 
array<> structure. This misclassification causes inaccuracies in the schema 
representation and can lead to issues in processing and query generation.
   
   here is the data type in the Doris DDL,which is array<varchar(500)>.
   
![企业微信截图_cb318f60-f716-45d7-810d-e255b1ed61b9](https://github.com/user-attachments/assets/1aa32fe3-c601-4415-857b-c9b19b425dd1)
   
   here is the data type parsed in Gravitino, which is incorrectly parsed as 
varchar(500).
   
![Xnip2024-10-11_10-21-23](https://github.com/user-attachments/assets/97f1e468-4fca-44e2-819a-e57cdcee64f2)
   
   
   ### Error message and/or stacktrace
   
   no error message
   
   ### How to reproduce
   
   version: 0.6.0-incubating
   reproduce: just create a table with array field in Doris,and see it in the 
Gravitino.
   
   ### Additional context
   
   Based on my investigation, the cause of this error lies in the 
deserialization process of data types. During deserialization 
(org.apache.gravitino.json.fromPrimitiveTypeString), strings that do not match 
PrimitiveType are evaluated through a regular expression to determine their 
Type. However, the regular expression for VARCHAR incorrectly matches the 
array<varchar()> data type, causing it to be misclassified as VARCHAR. This 
leads to the improper parsing of array<varchar(255)> as varchar(255).
   
   I believe the solution to this issue is to introduce a new Type.ArrayType 
data type and implement a corresponding regular expression to accurately match 
array<> data types. This way, array<> structures can be correctly identified 
during deserialization, avoiding misclassification as basic types like VARCHAR. 
This modification will ensure that the array<varchar(255)> type is properly 
parsed as an array.
   
   
![image](https://github.com/user-attachments/assets/80da199f-184b-4467-96f9-d469fed09c58)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to