[ https://issues.apache.org/jira/browse/CRUNCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345736#comment-16345736 ]
Ben Roling edited comment on CRUNCH-663 at 1/30/18 8:20 PM: ------------------------------------------------------------ The attached patch is a quick proof-of-concept. I wouldn't expect it to be merged directly. The patch has a modified WordCount examples that demonstrates leveraging this property. I should have just added a unit test to show it, but haven't done that yet. If I get feedback that the general approach is acceptable, I would certainly be happy to add one or more tests. was (Author: ben.roling): The attached patch is a quick proof-of-concept. I wouldn't expect it to be merged directly. The patch has a modified WordCount examples that demonstrates leveraging this property. I should have just added a unit test, to show it, but haven't done that yet. If I get feedback that the general approach is acceptable, I would certainly be happy to add one or more tests. > Expose Record-level File Path to Processing Functions > ----------------------------------------------------- > > Key: CRUNCH-663 > URL: https://issues.apache.org/jira/browse/CRUNCH-663 > Project: Crunch > Issue Type: Improvement > Components: Core > Reporter: Ben Roling > Assignee: Josh Wills > Priority: Major > Attachments: CRUNCH-663.patch > > > We have some processing pipelines where we want to know the file path that > each record being processed came from. It would be nice if this could be > exposed to the DoFns in our pipelines. > > This same desire was expressed a little over 1 year ago on the mailing list: > [http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34arip4w...@mail.gmail.com%3E] > > Unfortunately, that thread dead-ended. > > I will use the comments section and a patch to propose a simple, albeit > slightly hacky solution. Another alternative would be to create a new Source > that provides a PCollection<Pair<Path, Record>>, but I'm not sure of the > effort it would take to create that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)