tvalentyn commented on PR #37470:
URL: https://github.com/apache/beam/pull/37470#issuecomment-3844777699

   Thanks @shaheeramjad for the initiative. 
   
   1.
   > Takes the first N elements from this PCollection
   
   This is misleading since PCollections are inherently unordered and 
distributed. There is no concept of "first X" elements in a pcollection. 
   
   2. The way the docstring is worded may further create a misleading 
understanding that Take avoids  iterating over all elements in the collection. 
I can't think of an implementation of this helper that avoids iteration over 
entire collection unless we limit ingestion of elements at the source.   
   
   3.       first_10 = pcoll.take(10)
   
   this syntax is not typical for Beam python; i wouldn't introduce it without 
a discussion on dev@
   
   4. I am not (yet) convinced of the value of this helper if the only reason 
is to flatten the output from Top/Sample.
   
   5. Naming of the helper could also use more thought since we cannot easily 
rename Beam APIs, and would be appropriate to at least discuss on dev@.
   
   I'd like to request we revert this PR  until further discussion happens. 
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to