AlenkaF commented on issue #39149: URL: https://github.com/apache/arrow/issues/39149#issuecomment-1849519311
Thank you for opening an issue @rohanjain101 ! I have to agree that the docs are not clear about the `max_replacements` keyword and after a bit of a search in the codebase it is clear that only positive values are supported, negative (`-1`) means unlimited replacements as the [comment in the pandas issue](https://github.com/pandas-dev/pandas/issues/56404#issuecomment-1847679746) suggests. In Python `max_replacements` can be supplied as a keyword and is added to the `ReplaceSubstringOptions` class https://github.com/apache/arrow/blob/92e56ba8906f40996cc81bc09fca10c4d53b32fa/python/pyarrow/_compute.pyx#L1177 which unfortunatelly has no additional information https://arrow.apache.org/docs/python/generated/pyarrow.compute.ReplaceSubstringOptions.html#pyarrow.compute.ReplaceSubstringOptions There is a bit of additional information in the C++ docs (see notes of the [String transforms](https://arrow.apache.org/docs/dev/cpp/compute.html#string-transforms) table): > If [ReplaceSubstringOptions::max_replacements](https://arrow.apache.org/docs/dev/cpp/api/compute.html#_CPPv4N5arrow7compute23ReplaceSubstringOptions16max_replacementsE) != -1, it determines the maximum number of replacements made, **counting from the left**. which would suggest the method only accepts positive `max_replacements` as it is only counting from the left. And if we look at the code of `ReplaceString` in the C++ we see that `max_replacements` is decremented until `== 0` https://github.com/apache/arrow/blob/92e56ba8906f40996cc81bc09fca10c4d53b32fa/cpp/src/arrow/compute/kernels/scalar_string_ascii.cc#L2008-L2026 leading to the issue on hanging as reported. _I think the check for negative values should be added on the C++ side together with additional info in the docs._ This would be a good first issue if anybody is interested! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
