[ 
https://issues.apache.org/jira/browse/ARROW-15910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Callista Rogers updated ARROW-15910:
------------------------------------
    Description: 
running below results in {{"GetFileIno() yielded path 
'myBucket/features/MyParquet.parquet/year=2022/part-0019.snappy.parquet' which 
is outside base dir 'gs://myBucket/features/MyParquet.parquet/' "}}


{{import pyarrow.parquet as pq}}
{{import gcsfs}}
{{file_path="gs://myBucket/features/MyParquet.parquet/"}}
{{fs=gcsfs.GCSFileSystem()}}
{{table=pq.read_table(file_path,filesystem=fs)}}

 

Removing the gs:// from file_path results in a {{FileNotFoundError}}. Any 
variation of / or // at the beginning of the path gives me the 'outside base 
dir' error.

I also ran the below and got valid results using both file_path patterns, so I 
know it finds the path just fine.
{{from pyarrow.fs import FileSelector, PyFileSystem, FSSpecHandler}}
{{filesys = PyFileSystem(FSSpecHandler(fs))}}
{{selector = FileSelector(file_path, recursive=True)}}
{{filesys.get_file_info(selector)}}

  was:
running below results in {{"GetFileIno() yielded path 
'myBucket/features/MyParquet.parquet/year=2022/part-0019.snappy.parquet' which 
is outside base dir 'gs://myBucket/features/MyParquet.parquet/' "}}


{{import pyarrow.parquet as pq}}
{{import gcsfs}}
{{file_path="gs://myBucket/features/MyParquet.parquet/"}}
{{fs=gcsfs.GCSFileSystem()}}
{{table=pq.read_table(file_path,filesystem=fs)}}

 

Removing the gs:// from file_path results in a {{FileNotFoundError}}. Any 
variation of / or // at the beginning of the path gives me the 'outside base 
dir' error.

I also ran the below and got valid results using both file_path patterns, so I 
know it finds the path just fine.
{{from pyarrow.fs import FileSelector, PyFileSystem, FSSpecHandler
filesys = PyFileSystem(FSSpecHandler(fs))
selector = FileSelector(file_path, recursive=True)
filesys.get_file_info(selector)}}


> pyarrow.parquet.read_table either returns FileNotFound or ArrowInvalid
> ----------------------------------------------------------------------
>
>                 Key: ARROW-15910
>                 URL: https://issues.apache.org/jira/browse/ARROW-15910
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet, Python
>    Affects Versions: 6.0.1
>         Environment: GCP JupyterLab notebooks
>            Reporter: Callista Rogers
>            Priority: Major
>
> running below results in {{"GetFileIno() yielded path 
> 'myBucket/features/MyParquet.parquet/year=2022/part-0019.snappy.parquet' 
> which is outside base dir 'gs://myBucket/features/MyParquet.parquet/' "}}
> {{import pyarrow.parquet as pq}}
> {{import gcsfs}}
> {{file_path="gs://myBucket/features/MyParquet.parquet/"}}
> {{fs=gcsfs.GCSFileSystem()}}
> {{table=pq.read_table(file_path,filesystem=fs)}}
>  
> Removing the gs:// from file_path results in a {{FileNotFoundError}}. Any 
> variation of / or // at the beginning of the path gives me the 'outside base 
> dir' error.
> I also ran the below and got valid results using both file_path patterns, so 
> I know it finds the path just fine.
> {{from pyarrow.fs import FileSelector, PyFileSystem, FSSpecHandler}}
> {{filesys = PyFileSystem(FSSpecHandler(fs))}}
> {{selector = FileSelector(file_path, recursive=True)}}
> {{filesys.get_file_info(selector)}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to