[
https://issues.apache.org/jira/browse/ARROW-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503059#comment-17503059
]
Joris Van den Bossche commented on ARROW-14787:
-----------------------------------------------
For the actual fsspec issue, it would be best to test this with the fsspec
implementation that is based on the new pyarrow.fs.HadoopFileSystem (see my
comment at
https://github.com/fsspec/filesystem_spec/issues/810#issuecomment-1061983245).
For the actual issue, {{NativeFile}} not implementing the {{readline}} method,
that's not specific to HDFS, but general to our IO functionality. [~apitrou] do
you know if this ever has come up whether to implement {{readline(s)}}? (I
suppose there is no direct need for it for pyarrow itself)
> [Python] read an HDFS file by line failed when the open_mode is "rb"
> --------------------------------------------------------------------
>
> Key: ARROW-14787
> URL: https://issues.apache.org/jira/browse/ARROW-14787
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 6.0.0
> Environment: System: Ubuntu 18.04
> fsspec: 2021.10.1
> pyarrow: 6.0.0
> Reporter: nero
> Priority: Major
> Labels: hdfs
>
> Hi there,
> I found some problems when I use `{*}fsspec`{*} to read an HDFS file by line
> when the open_mode is "rb". It works fine when the *open_mode is "r"* or the
> {*}file is located locally{*}.
> some snippets:
> {code:java}
> import fsspec
> hdfs_file_path = "hdfs://xxxxxx"
> with fsspec.open(hdfs_file_path, "rb") as f:
> # raise UnspportedOperation
> f.readline() {code}
>
> Error logs:
>
> /opt/conda/lib/python3.7/site-packages/pyarrow/io.pxi in
> pyarrow.lib.NativeFile.readline()
> UnsupportedOperation:
> Originally from: https://github.com/fsspec/filesystem_spec/issues/810
--
This message was sent by Atlassian Jira
(v8.20.1#820001)