Csaba Ringhofer created IMPALA-9575:
---------------------------------------
Summary: Add basic BINARY support
Key: IMPALA-9575
URL: https://issues.apache.org/jira/browse/IMPALA-9575
Project: IMPALA
Issue Type: Sub-task
Components: Backend, Frontend
Reporter: Csaba Ringhofer
An initial testable implementation of BINARY would contain the following:
- DDL support for BINARY, e.g. create table
- read support from text file (stored with base64 encoding)
- basic client support (hs2, beeswax)
- cast from/to STRING
- basic operators (=,<,>), all should work the same way as for STRING
Optional in the first step:
- write support for text file
- joins on BINARY columns
- aggregates on BINARY columns
Hive also allows binary columns for partitioning, but it seems buggy and I
would prefer to avoid it in Impala.
The last time a new type (DATE) was added in Impala was a massive change:
https://gerrit.cloudera.org/#/c/12481/
I hope that BINARY will be much simpler, as:
- It should be handled by the backend exactly the same way as STRING, which can
mean that the backend work will be minimal (only the file readers/writers have
to differentiate between them). This is different in Hive, where STRING is
treated UTF-8, and binary is not.
- The frontend should also treat it similarly to STRING, just with much less
capabilities, e.g. no casts to other types than STRING and it shouldn't be
accepted by UDFs that expect STRING.
- As BINARY supports very few features, tests also need to cover much less
cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]