[ https://issues.apache.org/jira/browse/ARROW-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche reassigned ARROW-7957: -------------------------------------------- Assignee: Joris Van den Bossche > [Python] ParquetDataset cannot take HadoopFileSystem as filesystem > ------------------------------------------------------------------ > > Key: ARROW-7957 > URL: https://issues.apache.org/jira/browse/ARROW-7957 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.16.0 > Reporter: Catherine > Assignee: Joris Van den Bossche > Priority: Critical > Fix For: 1.0.0 > > > {{from pyarrow.fs import HadoopFileSystem}} > {{import pyarrow.parquet as pq}} > > {{file_name = "hdfs://localhost:9000/test/file_name.pq"}} > {{hdfs, path = HadoopFileSystem.from_uri(file_name)}} > {{dataset = pq.ParquetDataset(file_name, filesystem=hdfs)}} > > has error: > {{OSError: Unrecognized filesystem: <class > 'pyarrow._hdfs.HadoopFileSystem'>}} > > When I tried using the deprecated {{HadoopFileSystem}}: > {{import pyarrow}} > {{import pyarrow.parquet as pq}} > > {{file_name = "hdfs://localhost:9000/test/file_name.pq"}} > {{hdfs = pyarrow.hdfs.connect('localhost', 9000)}} > {{dataset = pq.ParquetDataset(file_names, filesystem=hdfs)}} > {{pa_schema = dataset.schema.to_arrow_schema()}} > {{pieces = dataset.pieces}} > {{for piece in pieces: }} > {{ print(piece.path)}} > > {{piece.path}} lose the {{hdfs://localhost:9000}} prefix. > > I think {{ParquetDataset}} should accept {{pyarrow.fs.}}{{HadoopFileSystem as > filesystem?}} > And {{piece.path}} should have the prefix? -- This message was sent by Atlassian Jira (v8.3.4#803005)