aokolnychyi commented on pull request #1466:
URL: https://github.com/apache/iceberg/pull/1466#issuecomment-693738355
Having a UDF that accepts columns from two relations does not eliminate the
cross join.
I guess we have two options:
- keep the join by file name and replace `contains` condition with another
UDF that would ignore authority
- replace the existing UDF that produces file names with another UDF that
would produce a scheme and a relative path and then use DataFrame operations.
That way, we will have only one UDF.
```
Column pathCond =
actualFileDF.col("relative_path").equalTo(validDataFileDF.col("relative_path"));
Column schemeEquality =
actualFileDF.col("scheme").equalTo(validDataFileDF.col("scheme"));
Column schemeCond =
validDataFileDF.col("scheme").isNull().or(schemeEquality);
Column joinCond = pathCond.and(schemeCond);
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]