[ 
https://issues.apache.org/jira/browse/ARROW-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492382#comment-17492382
 ] 

Ravi Gummadi commented on ARROW-15645:
--------------------------------------

[~kiszk] ,

I tried using pyarrow 6.0 on the client side and still the issue is seen.
So
(1) the issue is NOT there in pyarrow 3.0.0 on the client side and with flight 
server side arrow version 6.0.x
(2) the issue is seen with pyarrow 5.0.0 on the client side and flight server 
side arrow version 6.0.x
(3) the issue is seen with pyarrow 6.0.0 on the client side and flight server 
side arrow version 6.0.x

> Data read through Flight is having endianness issue on s390x
> ------------------------------------------------------------
>
>                 Key: ARROW-15645
>                 URL: https://issues.apache.org/jira/browse/ARROW-15645
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, FlightRPC, Python
>    Affects Versions: 5.0.0
>         Environment: Linux s390x (big endian)
>            Reporter: Ravi Gummadi
>            Priority: Major
>
> Am facing an endianness issue on s390x(big endian) when converting the data 
> read through flight to pandas data frame.
> (1) table.validate() fails with error
> Traceback (most recent call last):
>   File "/tmp/2.py", line 51, in <module>
>     table.validate()
>   File "pyarrow/table.pxi", line 1232, in pyarrow.lib.Table.validate
>   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Column 1: In chunk 0: Invalid: Negative offsets in 
> binary array
> (2) table.to_pandas() gives a segmentation fault
> ____________
> Here is a sample code that I am using:
> from pyarrow import flight
> import os
> import json
> flight_endpoint = os.environ.get("flight_server_url", 
> "grpc+tls://...local:443")
> print(flight_endpoint)
> #
> class TokenClientAuthHandler(flight.ClientAuthHandler):
>     """An example implementation of authentication via handshake.
>        With the default constructor, the user token is read from the 
> environment: TokenClientAuthHandler().
>        You can also pass a user token as parameter to the constructor, 
> TokenClientAuthHandler(yourtoken).
>     """
>     def \_\_init\_\_(self, token: str = None):
>         super().\_\_init\__()
>         if( token != None):
>             strToken = strToken = 'Bearer {}'.format(token)
>         else:
>             strToken = 'Bearer {}'.format(os.environ.get("some_auth_token"))
>         self.token = strToken.encode('utf-8')
>         #print(self.token)
>     def authenticate(self, outgoing, incoming):
>         outgoing.write(self.token)
>         self.token = incoming.read()
>     def get_token(self):
>         return self.token
>     
> readClient = flight.FlightClient(flight_endpoint)
> readClient.authenticate(TokenClientAuthHandler())
> cmd = json.dumps(\{...})
> descriptor = flight.FlightDescriptor.for_command(cmd)
> flightInfo = readClient.get_flight_info(descriptor)
> reader = readClient.do_get(flightInfo.endpoints[0].ticket)
> table = reader.read_all()
> print(table)
> print(table.num_columns)
> print(table.num_rows)
> table.validate()
> table.to_pandas()



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to