anantdamle commented on pull request #14852:
URL: https://github.com/apache/beam/pull/14852#issuecomment-846574298
@reuvenlax thanks for the heads up on time-complexity of reading BagState (I
honestly didn't know).
Can you help suggest the best way to handle the following scenario:
Context: I need to represent a nested-repeated data as flat-tables.
e.g.
lets say I have two input records, that need to be accumulated.
<table>
<thead>
<th>record-1</th>
<th>record-2</th>
<tbody>
<td><pre>{ id: 1, num: 1.23, arr: ["a", "b"] }</pre></td>
<td><pre>{ id: 2, num: 4.56, arr: [ "d", "e", "f"] }</pre></td>
</tbody>
</table>
The output is, the size of accumulated batch is actually not the sum of
serialized size of individual elements, instead, the accumulator needs access
to at least the headers list to compute the effective size of the accumulated
batch.
<table>
<thead>
<th>id</th>
<th>num</th>
<th>arr[0]</th>
<th>arr[1]</th>
<th>arr[2]</th>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1.23</td>
<td>"a"</td>
<td>"a"</td>
<td>[empty]</td>
</tr>
<tr>
<td>2</td>
<td>4.56</td>
<td>"d"</td>
<td>"e"</td>
<td>"f"</td>
</tr>
</tbody>
</table>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]