[ https://issues.apache.org/jira/browse/CRUNCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347598#comment-16347598 ]
Ben Roling commented on CRUNCH-663: ----------------------------------- Added a new patch where I modified CombineFileIT to test this new property. > Expose Record-level File Path to Processing Functions > ----------------------------------------------------- > > Key: CRUNCH-663 > URL: https://issues.apache.org/jira/browse/CRUNCH-663 > Project: Crunch > Issue Type: Improvement > Components: Core > Reporter: Ben Roling > Assignee: Josh Wills > Priority: Major > Attachments: CRUNCH-663-v2.patch, CRUNCH-663.patch > > > We have some processing pipelines where we want to know the file path that > each record being processed came from. It would be nice if this could be > exposed to the DoFns in our pipelines. > > This same desire was expressed a little over 1 year ago on the mailing list: > [http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34arip4w...@mail.gmail.com%3E] > > Unfortunately, that thread dead-ended. > > I will use the comments section and a patch to propose a simple, albeit > slightly hacky solution. Another alternative would be to create a new Source > that provides a PCollection<Pair<Path, Record>>, but I'm not sure of the > effort it would take to create that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)