[
https://issues.apache.org/jira/browse/BEAM-12378?focusedWorklogId=600937&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-600937
]
ASF GitHub Bot logged work on BEAM-12378:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 23/May/21 14:44
Start Date: 23/May/21 14:44
Worklog Time Spent: 10m
Work Description: anantdamle commented on pull request #14852:
URL: https://github.com/apache/beam/pull/14852#issuecomment-846574298
@reuvenlax thanks for the heads up on time-complexity of reading BagState (I
honestly didn't know).
Can you help suggest the best way to handle the following scenario:
Context: I need to represent a nested-repeated data as flat-tables.
e.g.
lets say I have two input records, that need to be accumulated.
<table>
<thead>
<th>record-1</th>
<th>record-2</th>
<tbody>
<td><pre>{ id: 1, num: 1.23, arr: ["a", "b"] }</pre></td>
<td><pre>{ id: 2, num: 4.56, arr: [ "d", "e", "f"] }</pre></td>
</tbody>
</table>
The output is, the size of accumulated batch is actually not the sum of
serialized size of individual elements, instead, the accumulator needs access
to at least the headers list to compute the effective size of the accumulated
batch.
<table>
<thead>
<th>id</th>
<th>num</th>
<th>arr[0]</th>
<th>arr[1]</th>
<th>arr[2]</th>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1.23</td>
<td>"a"</td>
<td>"a"</td>
<td>[empty]</td>
</tr>
<tr>
<td>2</td>
<td>4.56</td>
<td>"d"</td>
<td>"e"</td>
<td>"f"</td>
</tr>
</tbody>
</table>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 600937)
Time Spent: 1h (was: 50m)
> GroupIntoBatches should support byte-size batches
> -------------------------------------------------
>
> Key: BEAM-12378
> URL: https://issues.apache.org/jira/browse/BEAM-12378
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Reporter: Reuven Lax
> Priority: P2
> Time Spent: 1h
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)