Re: Hive SQL extension

Stamatis Zampetakis Thu, 22 Oct 2020 02:46:00 -0700

Hi Peter,

I am nowhere near being an expert but just wanted to share my thoughts.


If I understand correctly you would like some syntactic sugar in Hive to
support partitioning as per Iceberg. I cannot tell if that's really useful
or not but from my point of view it doesn't seem a very good idea to
introduce another layer of parsing before the actual parser (don't know if
there is one already). For instance, how are you gonna handle the situation
where there are syntax errors in your sugared part and what the end user
should see?

No matter how it is added if you give the possibility to the user to write
such queries it becomes part of the Hive syntax and as such a job of the
parser.

Best,
Stamatis


On Thu, Oct 22, 2020 at 9:49 AM Peter Vary <pv...@cloudera.com> wrote:

> Hi Hive experts,
>
> I would like to extend Hive SQL language to provide a way to create
> Iceberg partitioned tables like this:
>
> create table iceberg_test(
>         level string,
>         event_time timestamp,
>         message string,
>         register_time date,
>         telephone array <string>
>     )
>     partition by spec(
>         level identity,
>         event_time identity,
>         event_time hour,
>         register_time day
>     )
>     stored as iceberg;
>
>
> The problem is that this syntax is very specific of Iceberg, and I think
> it is not a good idea to change the Hive syntax globally to accommodate a
> specific use-case.
> The following CREATE TABLE statement could archive the same thing:
>
> create table iceberg_test(
>         level string,
>         event_time timestamp,
>         message string,
>         register_time date,
>         telephone array <string>
>     )
>     STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
>     TBLPROPERTIES ('iceberg.mr.table.partition.spec'='...');
>
>
> I am looking for a way to rewrite the original (Hive syntactically not
> correct) query to a new (syntactically correct) one.
>
> I was checking the hooks as a possible solution, but I have found that:
>
>    - HiveDriverRunHook.preDriverRun can get the original / syntactically
>    not correct query, but I have found no way to rewrite it to a syntactically
>    correct one (it looks like a read only query)
>    - HiveSemanticAnalyzerHook can rewrite the AST tree, but it needs a
>    syntactically correct query to start with
>
>
> Any other ideas how to archive the goals above? Either with Hooks, or with
> any other way?
>
> Thanks,
> Peter
>

Re: Hive SQL extension

Reply via email to