[jira] [Commented] (ARROW-15286) [Python] pyarrow.dataset.FileSystemDataset.take method causes Segmentation Fault

Dustin Zubke (Jira) Mon, 10 Jan 2022 08:31:34 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-15286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472146#comment-17472146
 ]


Dustin Zubke commented on ARROW-15286:
--------------------------------------

I'm not sure how you handle Jira issues. I consider this issue closed but am 
not sure what field to to select in the Resolution field. Feel free to "close" 
this issue in whatever way you see fit, or provide me with instruction on how 
to do so.  

Thanks!

> [Python] pyarrow.dataset.FileSystemDataset.take method causes Segmentation 
> Fault
> --------------------------------------------------------------------------------
>
>                 Key: ARROW-15286
>                 URL: https://issues.apache.org/jira/browse/ARROW-15286
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 5.0.0, 6.0.0, 6.0.1
>         Environment: Ubuntu 18.04, Python 3.8.6; macOS 11.6.2, Python 3.7.5
>            Reporter: Dustin Zubke
>            Assignee: Joris Van den Bossche
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 7.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Whenever I try calling the `pyarrow.dataset.FileSystemDataset.take` method, I 
> get a segmentation fault. 
> I first encountered this using a proprietary dataset and recreated it with 
> the UC Davis data below. I can successfully run the 
> pyarrow.dataset.FileSystemDataset.to_batches method but not the take() 
> method.  
> Steps to recreate:
> {code:java}
> !wget https://anson.ucdavis.edu/~clarkf/pems_parquet.zip
> !unzip -q pems_parquet.zip
> import pyarrow.dataset as ds
> file_path= 
> "./pems_sorted/station=402264/part-r-00151-ddaee723-f3f6-4f25-a34b-3312172aa6d7.snappy.parquet"
> dataset = ds.dataset(file_path)
> dataset.take(1)
> >>> 80874 segmentation fault
> {code}
> Creating a dataset from a directory as below also results in a segfault.
> {code:java}
> dir_path = "./pems_sorted/station=402264"
> dataset = ds.dataset(dir_path)
> dataset.take(1)
> {code}
> Environments tried:
>  * Ubuntu 18.04, Python 3.8.6
>  * macOS 11.6.2, Python 3.7.5



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-15286) [Python] pyarrow.dataset.FileSystemDataset.take method causes Segmentation Fault

Reply via email to