numbworks commented on issue #39846: URL: https://github.com/apache/arrow/issues/39846#issuecomment-2294864651
> you have a different problem, you don't have cmake installed. Building Pyarrow from source needs a matching version of Arrow C++ available as well. @assignUser Hey Jacob, thank you for your answer, but it seems from the thread that your solution doesn't work. Do you have a Dockerfile example that demonstrates that your proposal works? I also read the following answer from you in another thread: > Please see https://github.com/apache/arrow/issues/18036, we don't publish musl wheels at the moment. Are there plan to change this? Because, I don't know if you are aware of it, but **PyArrow will be a mandatory dependency for Pandas starting Pandas v3.0.0** - please read more here: [https://pandas.pydata.org/pdeps/0010-required-pyarrow-dependency.html](https://pandas.pydata.org/pdeps/0010-required-pyarrow-dependency.html). One of the official Python images on Docker Hub is based on Alpine Linux, which it's also the more optimized on a resources perspective. The lack of PyArrow wheels for Alpine means that, starting Pandas 3.0.0 (maybe in six months from now), thousands of Python developers and data scientists won't be able to do their work in a containerized environment. The only alternative at the moment is to use the Debian-based image on Python's Docker Hub, which it's 15x more resource hungry than Alpine: ``` FROM python:3.12.5-bullseye RUN pip install --upgrade pip \ && pip install numpy==1.26.3 \ && pip install pyarrow==15.0.0 \ && pip install openpyxl==3.1.0 \ && pip install pandas==2.2.0 \ ``` I hope you can discuss this matter within the team and assign the right priority to it. Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
