[
https://issues.apache.org/jira/browse/BEAM-12665?focusedWorklogId=636679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636679
]
ASF GitHub Bot logged work on BEAM-12665:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Aug/21 00:14
Start Date: 11/Aug/21 00:14
Worklog Time Spent: 10m
Work Description: InigoSJ commented on pull request #15126:
URL: https://github.com/apache/beam/pull/15126#issuecomment-896396763
> this LGTM.
> Why did we change `with_context` to `with_filename`? Maybe context could
return an object with further context? (e.g. partition, line name, etc)
`_ReadRange` currently only contains the `FileMetadata` of the object and
the `OffsetRange`. `FileMetadata` has the filename and the size, but I don't
fully see the use of adding the size as an output, since you'd probably can get
the same using `ReadableFiles` without the need of outputting row by row and
then aggregating. LMKWYT.
The other things you mention would be helpful, but (afaik) not possible with
the current implementation of `_ReadRange`. Maybe a future PR?
Thanks for having a look, Pablo!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 636679)
Time Spent: 4h 10m (was: 4h)
> Add option to return filename from ReadAll transforms
> -----------------------------------------------------
>
> Key: BEAM-12665
> URL: https://issues.apache.org/jira/browse/BEAM-12665
> Project: Beam
> Issue Type: New Feature
> Components: io-py-common
> Reporter: Inigo San Jose Visiers
> Priority: P2
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> When using ReadAll transforms (as `ReadAllFromText` and similar), it would be
> great to add the option to also return the filename.
> This would help with an use case of reading multiple files that are not known
> at launch time and perform aggregations by file
--
This message was sent by Atlassian Jira
(v8.3.4#803005)