[
https://issues.apache.org/jira/browse/BEAM-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129425#comment-16129425
]
Adam Levy commented on BEAM-2715:
---------------------------------
In the previous version of Beam we would use maxNumRecords with DirectRunner to
do end-to-end testing locally with a small sample size of production data. As
for why we cannot just run a streaming pipeline it is because we are dealing
with extremely high volume data so an unbounded source running locally quickly
causes an OutOfMemory Exception. As for TestStream, it does not use PubSub at
all so it is not useful for end-to-end testing.
> Expose PubsubSource to create UnboundedSource and utilize withMaxNumRecords
> from BoundedReadFromUnboundedSource
> ---------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-2715
> URL: https://issues.apache.org/jira/browse/BEAM-2715
> Project: Beam
> Issue Type: New Feature
> Components: runner-direct
> Reporter: Adam Levy
> Assignee: Thomas Groh
> Labels: pubsub
>
> We are ingesting from a Pubsub Read and are attempting to mimic the
> maxNumRecords that was available in 0.6.0. In order to do this we would need
> to utilize withMaxNumRecords from the BoundedReadFromUnboundedSource class.
> We would need to utilize the PubsubSource class to create the UnboundedSource
> from Pubsub. Would it be possible to expose PubsubSource? Currently what is
> the recommended way to create a bounded read from Pubsub with a set number of
> records?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)