[
https://issues.apache.org/jira/browse/ARROW-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338365#comment-16338365
]
ASF GitHub Bot commented on ARROW-2029:
---------------------------------------
jcrist opened a new pull request #1502: ARROW-2029: [Python] NativeFile.tell
errors after close
URL: https://github.com/apache/arrow/pull/1502
Previously checking if the file was closed was subclass specific, and wasn't
caught in the hdfs backed file, leading to program crashes.
This adds a check in `NativeFile.tell` for the file being open, and a test
on a few subclasses of `NativeFile` to assure the error is raised.
Note that since most python file-like objects raise a `ValueError` for
operations after close, I changed the type of the existing error for these
cases. This could be changed back, but an error should at least be thrown.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [Python] Program crash on `HdfsFile.tell` if file is closed
> -----------------------------------------------------------
>
> Key: ARROW-2029
> URL: https://issues.apache.org/jira/browse/ARROW-2029
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: Jim Crist
> Priority: Major
> Labels: pull-request-available
>
> Of all the `NativeFile` methods, `tell` is the only one that doesn't check if
> the file is still open before running. This can lead to crashes when using
> hdfs:
>
> {code:java}
> >>> import pyarrow as pa
> >>> h = pa.hdfs.connect()
> 18/01/24 22:31:35 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 18/01/24 22:31:36 WARN shortcircuit.DomainSocketFactory: The short-circuit
> local reads feature cannot be used because libhadoop cannot be loaded.
> >>> with h.open("/tmp/test.txt", mode='wb') as f:
> ... pass
> ...
> >>> f.tell()
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00007f52ccb6733d, pid=14868, tid=0x00007f52de2b9700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_151-b12) (build
> 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
> # Java VM: OpenJDK 64-Bit Server VM (25.151-b12 mixed mode linux-amd64
> compressed oops)
> # Problematic frame:
> # V [libjvm.so+0x67c33d]
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /working/python/hs_err_pid14868.log
> #
> # If you would like to submit a bug report, please visit:
> # http://bugreport.java.com/bugreport/crash.jsp
> #
> Aborted
> {code}
> In python, most file-like objects raise a `ValueError` if the file is closed:
> {code:java}
> >>> f = open("test.py", mode='wb')
> >>> f.close()
> >>> f.tell()
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: I/O operation on closed file
> >>> import io
> >>> buf = io.BytesIO()
> >>> buf.close()
> >>> buf.tell()
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: I/O operation on closed file.{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)