felipecrv commented on code in PR #31:
URL: https://github.com/apache/arrow-experiments/pull/31#discussion_r1727560521
##########
http/get_simple/python/server/server.py:
##########
@@ -56,24 +56,44 @@ def GetPutData():
return batches
-def make_reader(schema, batches):
- return pa.RecordBatchReader.from_batches(schema, batches)
-
-def generate_batches(schema, reader):
+def generate_buffers(schema, source):
with io.BytesIO() as sink, pa.ipc.new_stream(sink, schema) as writer:
- for batch in reader:
+ for batch in source:
sink.seek(0)
- sink.truncate(0)
writer.write_batch(batch)
+ sink.truncate()
yield sink.getvalue()
Review Comment:
To avoid the `del buffer` which I think is kinda ugly and confusing, I ended
up with this:
```python
def write_chunk(buffer):
if chunked:
self.wfile.write('{:X}\r\n'.format(len(buffer)).encode('utf-8'))
self.wfile.write(buffer)
if chunked:
self.wfile.write('\r\n'.encode('utf-8'))
self.wfile.flush()
foreach_batch_buffer(schema, source, write_chunk)
```
This means I can pass `sing.getbuffer()` to the `write_chunk` callback
without having to `del` it when it goes out of scope. This also makes the
version that chunks the buffer themselves avoid the need for more `del`
statements.
Do you think this is acceptable or the `yield` solution with `del`
statements is preferred?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]