[ 
https://issues.apache.org/jira/browse/CRUNCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated CRUNCH-663:
------------------------------
    Attachment: CRUNCH-663-v2.patch

> Expose Record-level File Path to Processing Functions
> -----------------------------------------------------
>
>                 Key: CRUNCH-663
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-663
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ben Roling
>            Assignee: Josh Wills
>            Priority: Major
>         Attachments: CRUNCH-663-v2.patch, CRUNCH-663.patch
>
>
> We have some processing pipelines where we want to know the file path that 
> each record being processed came from.  It would be nice if this could be 
> exposed to the DoFns in our pipelines.
>  
> This same desire was expressed a little over 1 year ago on the mailing list:
> [http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34arip4w...@mail.gmail.com%3E]
>  
> Unfortunately, that thread dead-ended.
>  
> I will use the comments section and a patch to propose a simple, albeit 
> slightly hacky solution.  Another alternative would be to create a new Source 
> that provides a PCollection<Pair<Path, Record>>, but I'm not sure of the 
> effort it would take to create that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to