[ 
https://issues.apache.org/jira/browse/BEAM-12665?focusedWorklogId=637052&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637052
 ]

ASF GitHub Bot logged work on BEAM-12665:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Aug/21 18:33
            Start Date: 11/Aug/21 18:33
    Worklog Time Spent: 10m 
      Work Description: chamikaramj commented on a change in pull request 
#15126:
URL: https://github.com/apache/beam/pull/15126#discussion_r687078276



##########
File path: sdks/python/apache_beam/io/filebasedsource.py
##########
@@ -377,8 +382,12 @@ def process(self, element, *args, **kwargs):
     if not source_list:
       return
     source = source_list[0].source
-    for record in source.read(range.new_tracker()):
-      yield record
+
+    if self._with_filename:
+      return [(metadata.path, record)
+              for record in source.read(range.new_tracker())]
+    else:
+      return [record for record in source.read(range.new_tracker())]

Review comment:
       Yeah, I think this could still result in OOMs (we are putting all 
elements to a list before returning).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 637052)
    Time Spent: 5h  (was: 4h 50m)

> Add option to return filename from ReadAll transforms
> -----------------------------------------------------
>
>                 Key: BEAM-12665
>                 URL: https://issues.apache.org/jira/browse/BEAM-12665
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-py-common
>            Reporter: Inigo San Jose Visiers
>            Priority: P2
>          Time Spent: 5h
>  Remaining Estimate: 0h
>
> When using ReadAll transforms (as `ReadAllFromText` and similar), it would be 
> great to add the option to also return the filename.
> This would help with an use case of reading multiple files that are not known 
> at launch time and perform aggregations by file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to