I don’t think I understood perfectly your point, but I try to give you the
answer that looks the simplest to me.
In your code there isn’t any operation on table 1 and 2 separately, it just
looks like you want to merge all those RecordBatches.
Now I think that:
1. you can use the to_batches() operation reported in the API for Table, but
I never tried it myself. In this way you create 2 tables, create batches from
these tables, put the batches togheter.
2. I would rather store ALL the BATCHES in the two streams in the SAME
python LIST, and then create an unique table using from_batches() as you
suggested. That’s because in your code you create two tables even though you
don’t seem to care about them.
I didn’t try, but I think that you can go both ways and then tell us if the
result is the same and if one of the two is faster then the other.
Da: Rares Vernica<mailto:rvern...@gmail.com>
Inviato: mercoledì 14 febbraio 2018 05:13
Oggetto: Merge multiple record batches
If I have multiple RecordBatchStreamReader inputs, what is the recommended
way to get all the RecordBatch from all the inputs together, maybe in a
Table? They all have the same schema. The source for the readers are
So, I do something like:
reader1 = pa.open_stream('foo')
table1 = reader1.read_all()
reader2 = pa.open_stream('bar')
table2 = reader2.read_all()
# table_all = ???
# OR maybe I don't need to create table1 and table2
# table_all = pa.Table.from_batches( ??? )