thisisnic commented on issue #46149:
URL: https://github.com/apache/arrow/issues/46149#issuecomment-2834208001

   Hi @r2evans, thanks for reporting this.
   
   Strange error and odd that you get 2 different behaviours between the two 
functions.
   
   My initial guess is that, as `read_parquet()` reads the file immediately 
into memory and `open_dataset()` scans the path and then later retrieves the 
data, there's something weird happening with the connection there.
   
   I'm not familiar with `fcntl` and asked chatGPT - now I don't know if this 
is accurate, but the gist of the answer is that `fcntl` is used to manipulate 
file descriptors and `F_RDADVISE` is an optimisation concerned with prefetching 
parts of an offset into cache.  Here, the version of sshfs you mention doesn't 
support `F_RDADVISE` properly, and so it fails entirely but really what we 
should be doing is skipping this optimisation.
   
   If this interpretation is correct, I'm unsure whether this should be fixed 
on the sshfs side where the `F_RDADVISE` isn't working as expected or for arrow 
to fail gracefully or skip the optimisation, but pinging @pitrou to check if 
any of this is accurate before I suggest concrete action here.  
   
   The section in the code that is the source of the error:
   
   
https://github.com/apache/arrow/blob/6e84d990379a0fe20a0a89311aa864f40efded23/cpp/src/arrow/io/file.cc#L275-L282


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to