youngyjd opened a new issue, #8858: URL: https://github.com/apache/gravitino/issues/8858
### Version main branch ### Describe what's wrong Context: Our fsspec-cfs is based on JNI and ultimately uses the Java HDFS client under the hood. I observed that a Ray application, before writing, will first call ls() to check if the file exists. The libhdfs JNI layer captures the Java FileNotFoundException and converts it into Python's FileType.NotFound (this [code](https://github.com/apache/hadoop/blob/4235dc626874df852864e3689d93ec280f53c534/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/exception.c#L37-L41) might be related). Ray then determines whether the file exists based on this FileType ([code](https://github.com/ray-project/ray/blob/364f83184753e16e1eadbd905ac9c664d52adde6/python/ray/data/datasource/file_datasink.py#L111)). The issue here is that the JNI layer doesn't seem to convert a [FilesetPathNotFoundException](https://github.com/datastrato/gravitino-uber/blob/10ab935cf592c4f81693cc7ffda04c13c06bd338/clients/filesystem-hadoop3/src/main/java/org/apache/gravitino/filesystem/hadoop/FilesetPathNotFoundException.java#L24C14-L24C42) into Python's FileType.NotFound. As a result, when fsspec tries to list the file, it ends up with an "unknown error". ### Error message and/or stacktrace ``` File "/usr/local/lib/python3.9/dist-packages/ray/data/datasource/file_datasink.py", line 106, in on_write_start if self.filesystem.get_file_info(self.path).type is FileType.NotFound: File "pyarrow/_fs.pyx", line 590, in pyarrow._fs.FileSystem.get_file_info File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status File "pyarrow/_fs.pyx", line 1498, in pyarrow._fs._cb_get_file_info File "/usr/local/lib/python3.9/dist-packages/pyarrow/fs.py", line 322, in get_file_info info = self.fs.info(path) File "/usr/local/lib/python3.9/dist-packages/fsspec/spec.py", line 681, in info out = self.ls(self._parent(path), detail=True, **kwargs) File "/home/docker/core/fsspec_cfs/cfs.py", line 237, in <lambda> return lambda *args, **kw: getattr(PyArrowCFS, item)( File "/home/docker/core/fsspec_cfs/cfs.py", line 129, in ls file_info_list = self.pahdfs.get_file_info(fs.FileSelector(path)) File "pyarrow/_fs.pyx", line 582, in pyarrow._fs.FileSystem.get_file_info File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status OSError: [Errno 255] HDFS list directory failed. Detail: [errno 255] Unknown error 255 ``` ### How to reproduce fsspec with libhdfs jni driver ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
