metalshanked opened a new issue, #47228:
URL: https://github.com/apache/arrow/issues/47228

   ### Describe the enhancement requested
   
   Problem: A bug report in the pandas repository,[ 
#61940](https://github.com/pandas-dev/pandas/issues/61940), reveals a 
significant usability gap: common pathlib.Path operations, such as using the / 
operator for joining paths, fail when applied to a pandas Series using the 
high-performance string[pyarrow] data type. This indicates a lack of seamless 
interoperability between Python's standard library for path manipulation and 
the Arrow data format, forcing users into inefficient, manual workarounds.   
   
   Proposed Solution: The ideal solution lies within the PyArrow library 
itself, which provides the backend for pandas' Arrow functionality. This 
contribution would involve implementing the necessary compute kernels and 
Python bindings within PyArrow to support element-wise, path-like operations on 
StringArray objects. This could take the form of new compute functions (e.g., 
pyarrow.compute.path_join) that are optimized for Arrow's columnar format. Once 
implemented in PyArrow, this functionality could be exposed through pandas, 
allowing for natural and efficient path manipulation directly on DataFrames.
   
   Impact: This enhancement would bridge a critical gap between a 
high-performance data layer (Arrow) and a standard Python library (pathlib). It 
would make working with file paths stored in modern, memory-efficient 
DataFrames seamless and intuitive for data scientists, improving both 
performance and developer ergonomics.   
   
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to