FelixYBW commented on code in PR #5099: URL: https://github.com/apache/incubator-gluten/pull/5099#discussion_r1568270599
########## docs/velox-backend-limitations.md: ########## @@ -25,9 +25,15 @@ Velox BloomFilter's serialization format is different from Spark's. BloomFilter #### Case Sensitive mode Gluten only supports spark default case-insensitive mode. If case-sensitive mode is enabled, user may get incorrect result. -#### Lookaround pattern for regexp functions -In velox, lookaround (lookahead/lookbehind) pattern is not supported in RE2-based implementations for Spark functions, -such as `rlike`, `regexp_extract`, etc. +#### Regexp functions +In Velox, regexp functions (`rlike`, `regexp_extract`, etc.) are implemented based on RE2, while in Spark they are based on `java.util.regex`. +* Lookaround (lookahead/lookbehind) pattern is not supported in RE2. +* When matching white space with pattern "\\s", RE2 doesn't treat "\v" (or "\x0b") as white space, but `java.util.regex` does. + +There are a few unknown incompatible cases. If user cannot tolerate the incompatibility risk, please enable the below configuration property. +``` +spark.gluten.sql.fallbackRegexpExpressions +``` Review Comment:  @codyschierbeck Is this the list of unsupported pattens in re2? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
