Ben Roling created CRUNCH-663:
---------------------------------

             Summary: Expose Record-level File Path to Processing Functions
                 Key: CRUNCH-663
                 URL: https://issues.apache.org/jira/browse/CRUNCH-663
             Project: Crunch
          Issue Type: Improvement
          Components: Core
            Reporter: Ben Roling
            Assignee: Josh Wills


We have some processing pipelines where we want to know the file path that each 
record being processed came from.  It would be nice if this could be exposed to 
the DoFns in our pipelines.

 

This same desire was expressed a little over 1 year ago on the mailing list:
[http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34arip4w...@mail.gmail.com%3E]

 

Unfortunately, that thread dead-ended.

 

I will use the comments section and a patch to propose a simple, albeit 
slightly hacky solution.  Another alternative would be to create a new Source 
that provides a PCollection<Pair<Path, Record>>, but I'm not sure of the effort 
it would take to create that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to