Csaba Ringhofer created IMPALA-11041:
----------------------------------------
Summary: Improve client support for returning complex types in
select list
Key: IMPALA-11041
URL: https://issues.apache.org/jira/browse/IMPALA-11041
Project: IMPALA
Issue Type: New Feature
Components: Backend, Clients
Reporter: Csaba Ringhofer
The current approach to returning complex types is to return them as string,
formatted as JSON. Unlike other types that are returned as string (e.g. date),
the schema also contains "STRING", so the client doesn't know the real complex
type. The benefit of this approach is that existing clients can fetch complex
columns without any modification (e.g Impyla assumes that all columns are
primitive:
https://github.com/cloudera/impyla/blob/3c7bcc8350f807126cdde313b0154f89c2bb5bdc/impala/hiveserver2.py#L1425
)
This could be improved in two ways:
- Returning the full complex type information e.g. array<int>
- Returning the data without converting it to string - complex types could be
broken up to primitive colums like Parquet or ORC does it, and assembled on the
client side as needed. This could potentially make both the client and server
side much faster.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]