[
https://issues.apache.org/jira/browse/ARROW-11390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272271#comment-17272271
]
Lance Dacey commented on ARROW-11390:
-------------------------------------
Actually, turbodbc would have been installed before pyarrow since version 3.0
was not on conda-forge so I moved it down to the pip section. Do I need to
reverse this installation process?
{code:java}
&& /opt/conda/bin/conda install -c conda-forge -yq \
pandas \
numpy \
pyodbc \
pybind11 \
turbodbc \
azure-storage-blob \
azure-storage-common \
xlrd \
openpyxl \
mysql-connector-python \
zeep \
xmltodict \
dask \
dask-labextension \
pymssql=2.1 \
sqlalchemy-redshift \
python-snappy \
seaborn \
python-gitlab \
pyxlsb \
humanfriendly \
jupyterlab \
notebook=6.1.4 \
pip \
&& /opt/conda/bin/pip install --no-cache-dir --upgrade pip \
smartsheet-python-sdk \
duo-client \
adlfs \
pyarrow \
"apache-airflow[postgres,redis,celery,crypto,ssh,password]==$AIRFLOW_VERSION" \
{code}
I have not been able to get turbodbc to work with pip which is why I am using
conda right now. Actually I was just trying to get it to work again using a
CFLAGS argument "-D_GLIBCXX_USE_CXX11_ABI=0", but had no luck. I will attempt
some more and perhaps raise an issue on the turbodbc project though.
Let me know if there is a proper way to install these libraries! (ideally with
just plain pip, since my base image is from Airflow which does not use conda by
default)
> [Python] pyarrow 3.0 issues with turbodbc
> -----------------------------------------
>
> Key: ARROW-11390
> URL: https://issues.apache.org/jira/browse/ARROW-11390
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 3.0.0
> Environment: pyarrow 3.0.0
> fsspec 0.8.4
> adlfs v0.5.9
> pandas 1.2.1
> numpy 1.19.5
> turbodbc 4.1.1
> Reporter: Lance Dacey
> Priority: Major
> Labels: python, turbodbc
>
> This is more of a turbodbc issue I think, but perhaps someone here would have
> some idea of what changed to cause potential issues.
> {code:java}
> cursor = connection.cursor()
> cursor.execute("select top 10 * from dbo.tickets")
> table = cursor.fetchallarrow(){code}
> I am able to run table.num_rows and it will print out 10.
> If I run table.to_pandas() or table.schema or try to write the table to a
> dataset, my kernel dies with no explanation. I reverted back to pyarrow 2.0
> and the same code works again.
> [https://github.com/blue-yonder/turbodbc/issues/289]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)