liunaijie commented on issue #7740: URL: https://github.com/apache/seatunnel/issues/7740#issuecomment-2373387374
> > > > hi, i update the DynamicCompile Document recently https://github.com/apache/seatunnel/pull/7730/files, you can take a look > > > > > > > > > That is to say, my code is incorrect and cannot be written this way; it only supports simple transformations and does not support aggregation or deduplication operations. > > > > > > Hi, base my understand, DynamicCompile is not support `Filter` data. @jackyyyyyssss please correct me if i am wrong. > > If you want filter the duplicated data, you can do like this: > > > > 1. use `DynamicalCompile` transform, add a flag column. > > 2. use `Sql` transform to filter that flag mark as duplicated. > > I tried using the DynamicCompiler Transform method you mentioned, intending to add a new field to each row of data as an identifier for data duplication, but this could not be achieved. Because I cannot define a common collection to store the existing row data. ,Here is my code: > > ```java > private Set<String> uniqueRows = new HashSet<>(); > public Column[] getInlineOutputColumns(CatalogTable inputCatalogTable) { > > ArrayList<Column> columns = new ArrayList<Column>(); > PhysicalColumn destColumn = > PhysicalColumn.of( > "duplicate", > BasicType.STRING_TYPE, > 10, > true, > "", > ""); > return new Column[]{ > destColumn > }; > > } > > public Object[] getInlineOutputFieldValues(SeaTunnelRowAccessor inputRow) { > Object field0 = inputRow.getField(0); > Object field1 = inputRow.getField(1); > Object field2 = inputRow.getField(2); > > String compositeKey = field0 + "#" + field1 + "#" + field2; > boolean isNew = uniqueRows.add(compositeKey); > Object[] fieldValues = new Object[1]; > if (!isNew){ > fieldValues[0] ="duplicate"; > }else { > fieldValues[0] ="no"; > } > return fieldValues; > } > ``` What do you mean: > Because I cannot define a common collection to store the existing row data. You has defined `Set<String> uniqueRows` to store it. The code looks good to me, is there anything wrong? When you add this transfrom, the output data should has one more column, named `duplicate`. And you need set this transform `parallelism` to 1 to ensure only one `Set` to check the data duplicate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
