Re: [I] pyarrow.fs.HadoopFileSystem Usage Problems [arrow]

via GitHub Wed, 22 May 2024 23:36:07 -0700


deep826 commented on issue #41777:
URL: https://github.com/apache/arrow/issues/41777#issuecomment-2126339498


   > I would try to only print the bytes and inspect that as what I see in the 
example can give unexpected output (different byte order in hdfs, int bit 
width, etc.).
   
   I also printed the bytes of two reading ways.
   
   1. reading file with python native api:
   `with open(local_path, 'rb') as f:`
   `    bs = f.read(12)`
   `    print(f"a in bytes: {bs[0:8]}")`
   `    print(f"b in bytes: {bs[8:12]}")`
   the result is: 
   `a in bytes: b'\x1f\x8b\x08\x00\x00\x00\x00\x00'`
   `b in bytes: b'\x00\x03D\xbc'`
   
   2. reading file with pyarrow hdfs client:
   `with hdfs_client.open_input_stream(hdfs_path) as f:`
   `    bs = f.read(12)`
   `    print(f"a in bytes: {bs[0:8]}")`
   `    print(f"b in bytes: {bs[8:12]}")`
   the result is:
   `a in bytes: b'\xe8\x03\x00\x00\x00\x00\x00\x00'`
   `b in bytes: b'@\x00\x00\x00'`
   
   the result of the second way is right. 
   byte order in my machine is 'little endian', but I use 'big endian' to parse 
those bytes, the result is alse wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] pyarrow.fs.HadoopFileSystem Usage Problems [arrow]

Reply via email to