youngyjd opened a new issue, #8858:
URL: https://github.com/apache/gravitino/issues/8858

   ### Version
   
   main branch
   
   ### Describe what's wrong
   
   Context:
   Our fsspec-cfs is based on JNI and ultimately uses the Java HDFS client 
under the hood.
   
   I observed that a Ray application, before writing, will first call ls() to 
check if the file exists.
   The libhdfs JNI layer captures the Java FileNotFoundException and converts 
it into Python's FileType.NotFound (this 
[code](https://github.com/apache/hadoop/blob/4235dc626874df852864e3689d93ec280f53c534/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/exception.c#L37-L41)
 might be related).
   Ray then determines whether the file exists based on this FileType 
([code](https://github.com/ray-project/ray/blob/364f83184753e16e1eadbd905ac9c664d52adde6/python/ray/data/datasource/file_datasink.py#L111)).
   
   The issue here is that the JNI layer doesn't seem to convert a 
[FilesetPathNotFoundException](https://github.com/datastrato/gravitino-uber/blob/10ab935cf592c4f81693cc7ffda04c13c06bd338/clients/filesystem-hadoop3/src/main/java/org/apache/gravitino/filesystem/hadoop/FilesetPathNotFoundException.java#L24C14-L24C42)
 into Python's FileType.NotFound.
   As a result, when fsspec tries to list the file, it ends up with an "unknown 
error".
   
   ### Error message and/or stacktrace
   
   ```
   File 
"/usr/local/lib/python3.9/dist-packages/ray/data/datasource/file_datasink.py", 
line 106, in on_write_start
     if self.filesystem.get_file_info(self.path).type is FileType.NotFound:
   File "pyarrow/_fs.pyx", line 590, in pyarrow._fs.FileSystem.get_file_info
   File "pyarrow/error.pxi", line 155, in 
pyarrow.lib.pyarrow_internal_check_status
   File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
   File "pyarrow/_fs.pyx", line 1498, in pyarrow._fs._cb_get_file_info
   File "/usr/local/lib/python3.9/dist-packages/pyarrow/fs.py", line 322, in 
get_file_info
     info = self.fs.info(path)
   File "/usr/local/lib/python3.9/dist-packages/fsspec/spec.py", line 681, in 
info
     out = self.ls(self._parent(path), detail=True, **kwargs)
   File "/home/docker/core/fsspec_cfs/cfs.py", line 237, in <lambda>
     return lambda *args, **kw: getattr(PyArrowCFS, item)(
   File "/home/docker/core/fsspec_cfs/cfs.py", line 129, in ls
     file_info_list = self.pahdfs.get_file_info(fs.FileSelector(path))
   File "pyarrow/_fs.pyx", line 582, in pyarrow._fs.FileSystem.get_file_info
   File "pyarrow/error.pxi", line 155, in 
pyarrow.lib.pyarrow_internal_check_status
   File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
   OSError: [Errno 255] HDFS list directory failed. Detail: [errno 255] Unknown 
error 255
   ```
   
   ### How to reproduce
   
   fsspec with libhdfs jni driver
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to