Re: [I] [Bug] [Transform V2] DynamicCompile compilation error in version 2.3.7 [seatunnel]

via GitHub Wed, 25 Sep 2024 01:16:45 -0700


liunaijie commented on issue #7740:
URL: https://github.com/apache/seatunnel/issues/7740#issuecomment-2373387374


   > > > > hi, i update the DynamicCompile Document recently 
https://github.com/apache/seatunnel/pull/7730/files, you can take a look
   > > > 
   > > > 
   > > > That is to say, my code is incorrect and cannot be written this way; 
it only supports simple transformations and does not support aggregation or 
deduplication operations.
   > > 
   > > 
   > > Hi, base my understand, DynamicCompile is not support `Filter` data. 
@jackyyyyyssss please correct me if i am wrong.
   > > If you want filter the duplicated data, you can do like this:
   > > 
   > > 1. use `DynamicalCompile` transform, add a flag column.
   > > 2. use `Sql` transform to filter that flag mark as duplicated.
   > 
   > I tried using the DynamicCompiler Transform method you mentioned, 
intending to add a new field to each row of data as an identifier for data 
duplication, but this could not be achieved. Because I cannot define a common 
collection to store the existing row data. ，Here is my code:
   > 
   > ```java
   >     private Set<String> uniqueRows = new HashSet<>();
   >     public Column[] getInlineOutputColumns(CatalogTable inputCatalogTable) 
{
   > 
   >         ArrayList<Column> columns = new ArrayList<Column>();
   >         PhysicalColumn destColumn =
   >                 PhysicalColumn.of(
   >                         "duplicate",
   >                         BasicType.STRING_TYPE,
   >                         10,
   >                         true,
   >                         "",
   >                         "");
   >         return new Column[]{
   >                 destColumn
   >         };
   > 
   >     }
   > 
   >     public Object[] getInlineOutputFieldValues(SeaTunnelRowAccessor 
inputRow) {
   >         Object field0 = inputRow.getField(0);
   >         Object field1 = inputRow.getField(1);
   >         Object field2 = inputRow.getField(2);
   > 
   >         String compositeKey = field0 + "#" + field1 + "#" + field2;
   >         boolean isNew = uniqueRows.add(compositeKey);
   >         Object[] fieldValues = new Object[1];
   >         if (!isNew){
   >             fieldValues[0] ="duplicate";
   >         }else {
   >             fieldValues[0] ="no";
   >         }
   >         return fieldValues;
   >     }
   > ```
   
   
   What do you mean:
   > Because I cannot define a common collection to store the existing row data.
   You has defined `Set<String> uniqueRows` to store it. The code looks good to 
me, is there anything wrong?
   
   When you add this transfrom, the output data should has one more column, 
named `duplicate`.  
   And you need set this transform `parallelism` to 1 to ensure only one `Set` 
to check the data duplicate.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug] [Transform V2] DynamicCompile compilation error in version 2.3.7 [seatunnel]

Reply via email to