+1
it's a great idea, I totally agree with that. but we should use a separate 
module or plugin to complete this new workflow.


On 2021/10/21 04:48:52, leo65535  <[email protected]> wrote: 
> hi dev,
> 
> 
> 
> 
> Based on the previous discuss[1] and  try to use inlong in product, we think 
> the sort module can not satisfy our needs,
> 
> 1. more sources and sinks are required in product, like kafka, hbase, 
> greenplum.
> 
> 2. workflows are isolated, each workflow is an independent yarn/k8s 
> application.
> 
> 3. lightweight ETL data process, like filter null.
> 
> 4. support dimension table lookup in serveral cases.
> 
> 5. support customized udfs.
> 
> especially the point 1 and point 2 are important for us.
> 
> 
> 
> 
> To implement the new workflow, we need to use flink table api, it will help 
> us handle
> 
> the table schema, field datatype, and more high sql semantics, also it 
> supports the 
> 
> integration of multiple data sources/sinks catalog.
> 
> 
> 
> 
> Note: the new workflow can not compatible with the origin one.
> 
> 
> 
> 
> Here is the flinksql workflow demo, 
> 
> ```
> 
> CREATE TABLE kafka_source (
> 
>   customerId int,
> 
>   oStatus int,
> 
>   nStatus int
> 
> ) with (
> 
>   'connector.type' = 'kafka',
> 
>   ...
> 
>   'connector.startup-mode' = 'earliest-offset',
> 
>   'format.type' = 'json'
> 
> );
> 
> 
> 
> 
> CREATE TABLE fs_source (
> 
>   customerId int,
> 
>   oStatus int,
> 
>   nStatus int
> 
> ) with (
> 
>   'connector.type' = 'filesystem',
> 
>   ...
> 
>   'path' = 'hdfs:///data/2021/06/01/xx.txt',
> 
>   'format.type' = 'json'
> 
> );
> 
> 
> 
> 
> 
> 
> 
> INSERT INTO fs_source
> 
> SELECT * FROM kafka_source
> 
> WHERE oStatus != 0;
> 
> ```
> 
> 
> 
> 
> Looking forward to your ideas, thanks.
> 
> 
> 
> 
> Best,
> 
> Leo65535
> 
> 
> 
> 
> [1] 
> https://lists.apache.org/thread.html/rf1a87cfa946d82e167392ede97583ec0a2bcdaeec97995dea6d4a86c%40%3Cdev.inlong.apache.org%3E

Reply via email to