tvalentyn commented on PR #37470: URL: https://github.com/apache/beam/pull/37470#issuecomment-3844777699
Thanks @shaheeramjad for the initiative. 1. > Takes the first N elements from this PCollection This is misleading since PCollections are inherently unordered and distributed. There is no concept of "first X" elements in a pcollection. 2. The way the docstring is worded may further create a misleading understanding that Take avoids iterating over all elements in the collection. I can't think of an implementation of this helper that avoids iteration over entire collection unless we limit ingestion of elements at the source. 3. first_10 = pcoll.take(10) this syntax is not typical for Beam python; i wouldn't introduce it without a discussion on dev@ 4. I am not (yet) convinced of the value of this helper if the only reason is to flatten the output from Top/Sample. 5. Naming of the helper could also use more thought since we cannot easily rename Beam APIs, and would be appropriate to at least discuss on dev@. I'd like to request we revert this PR until further discussion happens. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
