[DISCUSS] Introduce a new workflow based on flinksql

leo65535 Wed, 20 Oct 2021 21:49:22 -0700

hi dev,




Based on the previous discuss[1] and  try to use inlong in product, we think 
the sort module can not satisfy our needs,

1. more sources and sinks are required in product, like kafka, hbase, greenplum.

2. workflows are isolated, each workflow is an independent yarn/k8s application.

3. lightweight ETL data process, like filter null.

4. support dimension table lookup in serveral cases.

5. support customized udfs.

especially the point 1 and point 2 are important for us.




To implement the new workflow, we need to use flink table api, it will help us 
handle

the table schema, field datatype, and more high sql semantics, also it supports 
the 

integration of multiple data sources/sinks catalog.




Note: the new workflow can not compatible with the origin one.




Here is the flinksql workflow demo, 

```

CREATE TABLE kafka_source (

  customerId int,

  oStatus int,

  nStatus int

) with (

  'connector.type' = 'kafka',

  ...

  'connector.startup-mode' = 'earliest-offset',

  'format.type' = 'json'

);




CREATE TABLE fs_source (

  customerId int,

  oStatus int,

  nStatus int

) with (

  'connector.type' = 'filesystem',

  ...

  'path' = 'hdfs:///data/2021/06/01/xx.txt',

  'format.type' = 'json'

);







INSERT INTO fs_source

SELECT * FROM kafka_source

WHERE oStatus != 0;

```




Looking forward to your ideas, thanks.




Best,

Leo65535




[1] 
https://lists.apache.org/thread.html/rf1a87cfa946d82e167392ede97583ec0a2bcdaeec97995dea6d4a86c%40%3Cdev.inlong.apache.org%3E

[DISCUSS] Introduce a new workflow based on flinksql

Reply via email to