felipecrv opened a new pull request, #33:
URL: https://github.com/apache/arrow-experiments/pull/33

   This module puts the entire message in memory and seems to spend a lot
   of time looking for part delimiter and encoding/decoding the parts.
   
   The overhead of multipart/mixed parsing is 85% on my machine and after
   the ~1GB Arrow Stream message is fully in memory, it takes only 0.06%
   of the total execution time to parse it.
   
   
   ```sh
   $ python simple_client.py
   -- 3731 bytes of JSON data:
   [
     {'ticker': 'SGJ', 'description': 'Syhnffek Gacb Jdylqis'}
     {'ticker': 'EILD', 'description': 'Eicfef Iiafeutm Lydut Dbmgq'}
     {'ticker': 'QTO', 'description': 'Qclxkqjd Tkxan Odmac'}
     {'ticker': 'IHTS', 'description': 'Iowjy Hieuj Tvwecy Smxedh'}
     {'ticker': 'TGFJ', 'description': 'Tvztlhba Garebomj Fnwvwgf Jffldbg'}
     ...+55 entries...
   ]
   -- 988931832 bytes of Arrow data:
   Schema:
   ticker: string
   price: int64
   volume: int64
   
   Parsed 42000000 records in 6836 batch(es)
   -- Text Message:
   Hello Client,
   
   6836 Arrow batch(es) were sent in 6.561 seconds through 6837 HTTP
   response chunks. Average size of each chunk was 144644.13 bytes.
   
   --
   Sincerely,
   The Server
   -- End of Text Message --
   13.645 seconds elapsed
   11.833 seconds (86.72%) seconds parsing multipart/mixed response
   0.011 seconds (0.08%) seconds parsing Arrow stream
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to