Fang-Yu Rao created IMPALA-14768:
------------------------------------
Summary: Add operation type to the lineage graph
Key: IMPALA-14768
URL: https://issues.apache.org/jira/browse/IMPALA-14768
Project: IMPALA
Issue Type: Task
Components: Frontend
Reporter: Fang-Yu Rao
Assignee: Fang-Yu Rao
Currently, a lineage event log produced by Impala does not include the
information about the operation type.
{code}
{"queryText":"create table test_db_01.test_tbl_01 (id
int)","queryId":"b44da06a10682ce9:286bd74300000000","hash":"7debad31b299d7cccdf78a67968eb39d","user":"[email protected]","timestamp":1771622004,"endTime":1771622005,"edges":[],"vertices":[]}
{code}
However, some lineage event processing tool, e.g., Atlas, requires this piece
of information. To derive the operation type, tools like
https://github.com/apache/atlas/blob/master/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaLineageHook.java
relies on regular expressions in
https://github.com/apache/atlas/blob/14246fe/addons/impala-bridge/src/main/java/org/apache/atlas/impala/hook/ImpalaOperationParser.java#L30-L65
to determine the operation type of the logged lineage event. But such regular
expressions are not able to determine the operation type in all cases. One such
example is when the SQL statement contains one-line comment.
One solution to the aforementioned issue is to make sure the query text of a
lineage event is a valid SQL statement (IMPALA-14741).
An alternative is for Impala to add an additional field in its lineage graph to
indicate the operation type. Once Impala is able to log the operation type in a
lineage event,
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]