David Chen created TAJO-809:
-------------------------------
Summary: Langauge extension for non-scalar types
Key: TAJO-809
URL: https://issues.apache.org/jira/browse/TAJO-809
Project: Tajo
Issue Type: New Feature
Reporter: David Chen
This ticket is to track the work for defining the syntax for nested schemas,
maps, arrays, and unions and the work for adding the syntax to the parser.
Initially, we can add stubs for the parser endpoints that will then be fleshed
out when support for the data type is actually implemented (see other subtasks
of TAJO-710).
I have an idea of a possible DDL syntax for these types, and I would like to
get your feedback on it. I considered just using Hive's syntax but I felt that
it was not the best syntax for these types.
Instead of calling nested records "structs" like the way Hive does, I simply
call them records as well and use the same syntax used for declaring the
top-level record fields:
{code}
create table record_example (
nested_field record (
field1 int,
field2 double),
two_levels_nested record (
inner_nested record (
field3 string,
field4 int),
field5 int),
) using parquet;
{code}
For arrays, maps, and unions, I am using a syntax inspired by Scala's syntax
for generics:
{code}
create table array_example (
int_array array[int],
record_array array[record (
field1 int,
field2 string)]
) using avro;
create table map_example (
string_to_int map[string, int],
int_to_record map[int, record (
field1 string,
field2 int)],
) using avro;
create table union_example (
integers union[bit, smallint, integer, bigint]
) using parquet;
{code}
Of course, it is possible that when we implement these data types, we may make
changes to the syntax, but for now, I think we should define an initial
language. Once the initial syntax has stabilized, I will write a formal grammar
for it.
--
This message was sent by Atlassian JIRA
(v6.2#6252)