rdblue commented on code in PR #4902:
URL: https://github.com/apache/iceberg/pull/4902#discussion_r907977550
##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java:
##########
@@ -140,20 +139,34 @@ private RewriteDataFiles checkAndApplyOptions(InternalRow
args, RewriteDataFiles
return action.options(options);
}
- private RewriteDataFiles checkAndApplyStrategy(RewriteDataFiles action,
String strategy, SortOrder sortOrder) {
+ private RewriteDataFiles checkAndApplyStrategy(
+ RewriteDataFiles action,
+ String strategy,
+ String sortOrderString,
+ Table table) {
+ Pattern zOrderPattern = Pattern.compile("zorder\\s*\\(.*\\)",
Pattern.CASE_INSENSITIVE);
+ boolean isZOrder = sortOrderString != null &&
zOrderPattern.matcher(sortOrderString).matches();
// caller of this function ensures that between strategy and sortOrder, at
least one of them is not null.
if (strategy == null || strategy.equalsIgnoreCase("sort")) {
- return action.sort(sortOrder);
+ if (isZOrder) {
+ String columns = sortOrderString.substring(
Review Comment:
Why not add rules to the SQL parser so that we can parse these strings? This
follows the same pattern as transforms in the SQL parser, so we can easily
produce a real parser rule:
```antlr
fieldAndTransformList
: '(' fields+=transform (',' fields+=transform)* ')'
;
```
Then you could create a call for that in the parser and convert. I think
that would be better.
The problem with customer parsing is that this may produce strange results
that don't correspond to what a real parser would do. For example,
`zorder(col1, col2), col3)` matches and makes it to the column names. Then it
produces an error about not being able to find `col2)` instead of complaining
that there's an unexpected char after col3.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]