[
https://issues.apache.org/jira/browse/ARROW-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172115#comment-17172115
]
Alex Emelynov commented on ARROW-9662:
--------------------------------------
Looks like we have some data corruption, this file contains `null` row with
array of values it's wrong, and old pyarrow fails trying to read this column
> Python feather reader segfaults on some file without explicit columns
> ---------------------------------------------------------------------
>
> Key: ARROW-9662
> URL: https://issues.apache.org/jira/browse/ARROW-9662
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: Alex
> Priority: Major
> Attachments: 2020-07-30_BF43CA09E5404F9_actuals.feather.zstd
>
>
> this code fails:
> {{from pyarrow import feather}}
> {{feather.read_feather('2020-07-30_BF43CA09E5404F9_actuals.feather.zstd',
> columns=None, use_threads=bool(True))}}
>
> >>> from pyarrow import feather
> >>> feather.read_feather('2020-07-30_BF43CA09E5404F9_actuals.feather.zstd',
> columns=None, use_threads=bool(True))
> [1] 37494 segmentation fault python
> and this not
> {{from pyarrow import feather}}
> {{feather.read_feather('2020-07-30_BF43CA09E5404F9_actuals.feather.zstd',
> columns=['prediction_id', 'class'], use_threads=bool(True))}}
>
> env:
> MacOS Catalina 10.15.5
> $ python -V
> Python 3.7.0
> $ pip list
> Package Version Location
> ------------------------------------ -------------
> ------------------------------------
> adal 1.2.3
> aioredis 1.3.1
> amqp 2.5.2
> apipkg 1.5
> appdirs 1.4.3
> applicationinsights 0.11.9
> async-timeout 3.0.1
> asyncio 3.4.3
> asyncio-redis 0.15.1
> attrs 19.3.0
> azure-common 1.1.25
> azure-core 1.4.0
> azure-graphrbac 0.61.1
> azure-identity 1.2.0
> azure-mgmt-authorization 0.60.0
> azure-mgmt-containerregistry 2.8.0
> azure-mgmt-keyvault 2.2.0
> azure-mgmt-resource 8.0.1
> azure-mgmt-storage 9.0.0
> azureml-automl-core 1.3.0
> azureml-automl-runtime 1.3.0
> azureml-core 1.3.0.post2
> azureml-dataprep 1.4.3
> azureml-dataprep-native 14.1.0
> azureml-defaults 1.3.0
> azureml-explain-model 1.3.0
> azureml-interpret 1.3.0
> azureml-model-management-sdk 1.0.1b6.post1
> azureml-pipeline 1.3.0
> azureml-pipeline-core 1.3.0
> azureml-pipeline-steps 1.3.0
> azureml-sdk 1.3.0
> azureml-telemetry 1.3.0
> azureml-train 1.3.0
> azureml-train-automl 1.3.0
> azureml-train-automl-client 1.3.0
> azureml-train-automl-runtime 1.3.0
> azureml-train-core 1.3.0.post1
> azureml-train-restclients-hyperdrive 1.3.0
> backports.tempfile 1.0
> backports.weakref 1.0.post1
> beautifulsoup4 4.8.2
> billiard 3.6.3.0
> bleach 3.1.4
> boto 2.49.0
> boto3 1.12.34
> botocore 1.15.34
> cachetools 4.0.0
> celery 4.4.0
> certifi 2019.11.28
> cffi 1.14.0
> chardet 3.0.4
> click 7.1.1
> cloudpickle 1.4.1
> configparser 3.7.4
> contextlib2 0.6.0.post1
> coverage 5.0.4
> cryptography 2.9.2
> Cython 0.29.17
> dill 0.3.1.1
> distlib 0.3.0
> distro 1.5.0
> docker 4.2.0
> docutils 0.15.2
> dotnetcore2 2.1.14
> entrypoints 0.3
> execnet 1.7.1
> fastapi 0.53.2
> feather-format 0.4.1
> filelock 3.0.12
> fire 0.3.1
> flake8 3.7.9
> Flask 1.0.3
> freezegun 0.3.15
> fsspec 0.7.1
> fusepy 3.0.1
> gensim 3.8.3
> gevent 1.4.0
> google-api-core 1.16.0
> google-auth 1.12.0
> google-cloud-automl 0.10.0
> google-cloud-core 1.3.0
> google-cloud-storage 1.26.0
> google-resumable-media 0.5.0
> googleapis-common-protos 1.51.0
> greenlet 0.4.15
> grpcio 1.27.2
> gunicorn 19.9.0
> h11 0.9.0
> hiredis 1.0.1
> HLL 1.3.1
> httptools 0.1.1
> idna 2.9
> importlib-metadata 1.6.0
> interpret-community 0.9.2
> interpret-core 0.1.20
> isodate 0.6.0
> itsdangerous 1.1.0
> jeepney 0.4.3
> Jinja2 2.11.2
> jmespath 0.9.5
> joblib 0.14.1
> json-logging-py 0.2
> JsonForm 0.0.2
> jsonpickle 1.3
> jsonschema 3.2.0
> JsonSir 0.0.2
> keras2onnx 1.6.1
> keyring 21.2.0
> kombu 4.6.8
> liac-arff 2.4.0
> lightgbm 2.3.0
> lxml 4.5.0
> mangum 0.9.0
> MarkupSafe 1.1.1
> mccabe 0.6.1
> mock 4.0.2
> more-itertools 8.2.0
> msal 1.2.0
> msal-extensions 0.1.3
> msrest 0.6.13
> msrestazure 0.6.3
> multidict 4.7.5
> ndg-httpsclient 0.5.1
> nimbusml 1.7.0
> numpy 1.16.2
> oauthlib 3.1.0
> onnx 1.6.0
> onnxconverter-common 1.6.0
> onnxmltools 1.4.1
> packaging 20.3
> pandas 0.23.4
> pathspec 0.8.0
> patsy 0.5.1
> pip 10.0.1
> pkginfo 1.5.0.1
> pluggy 0.13.1
> pmdarima 1.1.1
> portalocker 1.7.0
> protobuf 3.11.3
> psutil 5.7.0
> py 1.8.1
> py-cpuinfo 5.0.0
> pyarrow 0.17.1
> pyasn1 0.4.8
> pyasn1-modules 0.2.8
> pycodestyle 2.5.0
> pycparser 2.20
> pydantic 1.4
> pyflakes 2.1.1
> Pygments 2.6.1
> PyJWT 1.7.1
> pyOpenSSL 19.1.0
> pyparsing 2.4.6
> pyrsistent 0.16.0
> pytest 5.4.1
> pytest-cov 2.8.1
> pytest-forked 1.1.3
> pytest-runner 5.2
> pytest-xdist 1.31.0
> python-dateutil 2.8.1
> python-dotenv 0.14.0
> Python-EasyConfig 0.1.7
> pytz 2019.3
> PyYAML 5.3.1
> readme-renderer 25.0
> redis 3.4.1
> requests 2.23.0
> requests-oauthlib 1.3.0
> requests-toolbelt 0.9.1
> Resource 0.2.1
> rsa 4.0
> ruamel.yaml 0.16.10
> ruamel.yaml.clib 0.2.0
> s3fs 0.4.2
> s3transfer 0.3.3
> scikit-learn 0.20.3
> scipy 1.1.0
> SecretStorage 3.1.2
> setuptools 46.1.3
> shap 0.34.0
> shortuuid 1.0.1
> simplejson 3.17.2
> six 1.14.0
> skl2onnx 1.4.9
> sklearn-pandas 1.7.0
> smart-open 1.9.0
> soupsieve 2.0
> starlette 0.13.2
> statsmodels 0.10.2
> termcolor 1.1.0
> toml 0.10.0
> tox 3.14.6
> tqdm 4.45.0
> twine 3.1.1
> typing-extensions 3.7.4.2
> urllib3 1.25.8
> uvicorn 0.11.3
> uvloop 0.14.0
> vcrpy 4.0.2
> vine 1.3.0
> virtualenv 20.0.15
> wcwidth 0.1.9
> webencodings 0.5.1
> websocket-client 0.57.0
> websockets 8.1
> Werkzeug 0.16.1
> wheel 0.30.0
> wrapt 1.12.1
> wsgi-intercept 1.9.2
> yarl 1.4.2
> zipp 3.1.0
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)