+1 it's a great idea, I totally agree with that. but we should use a separate module or plugin to complete this new workflow.
On 2021/10/21 04:48:52, leo65535 <[email protected]> wrote: > hi dev, > > > > > Based on the previous discuss[1] and try to use inlong in product, we think > the sort module can not satisfy our needs, > > 1. more sources and sinks are required in product, like kafka, hbase, > greenplum. > > 2. workflows are isolated, each workflow is an independent yarn/k8s > application. > > 3. lightweight ETL data process, like filter null. > > 4. support dimension table lookup in serveral cases. > > 5. support customized udfs. > > especially the point 1 and point 2 are important for us. > > > > > To implement the new workflow, we need to use flink table api, it will help > us handle > > the table schema, field datatype, and more high sql semantics, also it > supports the > > integration of multiple data sources/sinks catalog. > > > > > Note: the new workflow can not compatible with the origin one. > > > > > Here is the flinksql workflow demo, > > ``` > > CREATE TABLE kafka_source ( > > customerId int, > > oStatus int, > > nStatus int > > ) with ( > > 'connector.type' = 'kafka', > > ... > > 'connector.startup-mode' = 'earliest-offset', > > 'format.type' = 'json' > > ); > > > > > CREATE TABLE fs_source ( > > customerId int, > > oStatus int, > > nStatus int > > ) with ( > > 'connector.type' = 'filesystem', > > ... > > 'path' = 'hdfs:///data/2021/06/01/xx.txt', > > 'format.type' = 'json' > > ); > > > > > > > > INSERT INTO fs_source > > SELECT * FROM kafka_source > > WHERE oStatus != 0; > > ``` > > > > > Looking forward to your ideas, thanks. > > > > > Best, > > Leo65535 > > > > > [1] > https://lists.apache.org/thread.html/rf1a87cfa946d82e167392ede97583ec0a2bcdaeec97995dea6d4a86c%40%3Cdev.inlong.apache.org%3E
