On Thu, Jan 28, 2021 at 3:40 PM Sean Owen <sro...@gmail.com> wrote: > It isn't that regexp_extract_all (for example) is useless outside SQL, > just, where do you draw the line? Supporting 10s of random SQL functions > across 3 other languages has a cost, which has to be weighed against > benefit, which we can never measure well except anecdotally: one or two > people say "I want this" in a sea of hundreds of thousands of users. >
+1 to this, but I will add that Jira and Stack Overflow activity can sometimes give good signals about API gaps that are frustrating users. If there is an SO question with 30K views about how to do something that should have been easier, then that's an important signal about the API. For this specific case, I think there is a fine argument > that regexp_extract_all should be added simply for consistency > with regexp_extract. I can also see the argument that regexp_extract was a > step too far, but, what's public is now a public API. > I think in this case a few references to where/how people are having to work around missing a direct function for regexp_extract_all could help guide the decision. But that itself means we are making these decisions on a case-by-case basis. >From a user perspective, it's definitely conceptually simpler to have SQL functions be consistent and available across all APIs. Perhaps if we had a way to lower the maintenance burden of keeping functions in sync across SQL/Scala/Python/R, it would be easier for everyone to agree to just have all the functions be included across the board all the time. Would, for example, some sort of automatic testing mechanism for SQL functions help here? Something that uses a common function testing specification to automatically test SQL, Scala, Python, and R functions, without requiring maintainers to write tests for each language's version of the functions. Would that address the maintenance burden?