[PR] Use Array[String] instead of defaulting to String [doris-spark-connector]

via GitHub Fri, 29 May 2026 18:26:58 -0700


addu390 opened a new pull request, #362:
URL: https://github.com/apache/doris-spark-connector/pull/362


   # Proposed changes
   
   Issue Number: close #341
   
   ## Problem Summary:
   
   Doris `ARRAY` columns are currently exposed to Spark as `StringType`, so SQL 
functions like `size(col)` fail with `DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE`.
   
   This PR adds an opt-in config `doris.read.array.native-type` (default 
`false`) that surfaces them as `ArrayType(StringType)` instead. Covers both 
`thrift` and `arrow` read modes. Default-off keeps existing users on the legacy 
JSON-string behavior.
   
   Element-type inference (e.g. `array<int>` → `IntegerType`) is intentionally 
out of scope and left as a follow-up.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: No (gated by a default-off config)
   2. Has unit tests been added: Yes (`SchemaConvertors`, `RowConvertors`, plus 
parameterized IT for both read modes)
   3. Has document been added or modified: Yes (option description); upstream 
docs at `apache/doris` will be a follow-up
   4. Does it need to update dependencies: No
   5. Are there any changes that cannot be rolled back: No
   
   ## Further comments
   
   Built on top of the discussion in #341 and the design feedback on the stale 
#345. Happy to iterate on the option name or scope.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Use Array[String] instead of defaulting to String [doris-spark-connector]

Reply via email to