[ https://issues.apache.org/jira/browse/TEZ-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924291#comment-13924291 ]
Mohammad Kamrul Islam commented on TEZ-873: ------------------------------------------- [~sseth] , I don't know that hive gets the filename from Input split. Can you please give me little more details or pointer? The usage of new API for TezGroupedSplits will be part of user application code. I plan to provide a new example (TF-IDF) in separate JIRA that will use it. Anyway, one such usage is as follows: {noformat} FileSplit fs = (FileSplit)((TezGroupedSplit)input.getNewInputSplit()).getWrappedSplits().get(0); String fileName = fs.getPath().getName(); {noformat} > Allow MRInputLegacy to expose the individual input split > -------------------------------------------------------- > > Key: TEZ-873 > URL: https://issues.apache.org/jira/browse/TEZ-873 > Project: Apache Tez > Issue Type: Bug > Reporter: Mohammad Kamrul Islam > Assignee: Mohammad Kamrul Islam > Attachments: TEZ-873.1.patch, TEZ-873.2.patch > > > Currently there is no way of getting InputSplit from TezProcessor. In current > MR framework, there is a way to find out the filename through FileSplit. > For example, one common uses is to get the filename in map > String fileName = ((FileSplit) context.getInputSplit()).getPath().getName(); > There are other meta-data in Inputsplit that could be used by existing MR > user. > This JIRA is to add APIs to expose the InputSplit by adding these > TezGroupedSplit.getWrapperSplit() and MRInput.getInputSplit(). > Although MRInputLegacy provide an API to get the InputSplit, it has few > issues: > * Without TezGroupedSplit.getWrapperSplit() it is unusable. > * Since it is used in various use cases, I propose to move it from > MRInputLegacy to MRInput. > * Currently the APIs are named as getNewInputSplit() and getOldInputSplit(). > These should be merged into one : getInputSplit(). The new/old API should be > handled internally. > Please give your feedback. -- This message was sent by Atlassian JIRA (v6.2#6252)