AlenkaF commented on issue #39149:
URL: https://github.com/apache/arrow/issues/39149#issuecomment-1849519311

   Thank you for opening an issue @rohanjain101 !
   
   I have to agree that the docs are not clear about the `max_replacements` 
keyword and after a bit of a search in the codebase it is clear that only 
positive values are supported, negative (`-1`) means unlimited replacements as 
the [comment in the pandas 
issue](https://github.com/pandas-dev/pandas/issues/56404#issuecomment-1847679746)
 suggests.
   
   In Python `max_replacements` can be supplied as a keyword and is added to 
the `ReplaceSubstringOptions` class
   
https://github.com/apache/arrow/blob/92e56ba8906f40996cc81bc09fca10c4d53b32fa/python/pyarrow/_compute.pyx#L1177
   which unfortunatelly has no additional information
   
https://arrow.apache.org/docs/python/generated/pyarrow.compute.ReplaceSubstringOptions.html#pyarrow.compute.ReplaceSubstringOptions
   
   There is a bit of additional information in the C++ docs (see notes of the 
[String 
transforms](https://arrow.apache.org/docs/dev/cpp/compute.html#string-transforms)
 table):
   > If 
[ReplaceSubstringOptions::max_replacements](https://arrow.apache.org/docs/dev/cpp/api/compute.html#_CPPv4N5arrow7compute23ReplaceSubstringOptions16max_replacementsE)
 != -1, it determines the maximum number of replacements made, **counting from 
the left**.
   
   which would suggest the method only accepts positive `max_replacements` as 
it is only counting from the left.
   
   And if we look at the code of `ReplaceString` in the C++ we see that 
`max_replacements` is decremented until `== 0`
   
https://github.com/apache/arrow/blob/92e56ba8906f40996cc81bc09fca10c4d53b32fa/cpp/src/arrow/compute/kernels/scalar_string_ascii.cc#L2008-L2026
   leading to the issue on hanging as reported.
   
   _I think the check for negative values should be added on the C++ side 
together with additional info in the docs._
   
   This would be a good first issue if anybody is interested! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to