[ https://issues.apache.org/jira/browse/TEZ-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971862#comment-13971862 ]
Siddharth Seth commented on TEZ-873: ------------------------------------ [~kamrul], lets get this in to MRInputLegacy for now. Please annotate as Unstable. For simple cases where a user needs to access split information up front - and does not need to do anything when the underlying split changes, this is OK. However, if a user expects to change processing based on the reader moving to a new file - this is very inefficient. That is the Hive use case. On the patch itself, it needs to be rebase. - Please avoid changes which are not required for the patch - wrappedSplits/GroupedSplits in the grouping classes. usingNewApi in MRInput. The method name can be isUsingNewApi - but the internal parameter does not need to change. This will get rid of other unnecessary changes in Grouping as well. - s/getGroupedSplits/getConstituentSplits in the Grouping classes. Could you also add a note on these methods - something to the extent of, "Is not meant to be used inside of a reader tight loop and if the functionality is required, it should be a new jira" > Allow MRInputLegacy to expose the individual input split > -------------------------------------------------------- > > Key: TEZ-873 > URL: https://issues.apache.org/jira/browse/TEZ-873 > Project: Apache Tez > Issue Type: Bug > Reporter: Mohammad Kamrul Islam > Assignee: Mohammad Kamrul Islam > Attachments: TEZ-873.1.patch, TEZ-873.2.patch > > > Currently there is no way of getting InputSplit from TezProcessor. In current > MR framework, there is a way to find out the filename through FileSplit. > For example, one common uses is to get the filename in map > String fileName = ((FileSplit) context.getInputSplit()).getPath().getName(); > There are other meta-data in Inputsplit that could be used by existing MR > user. > This JIRA is to add APIs to expose the InputSplit by adding these > TezGroupedSplit.getWrapperSplit() and MRInput.getInputSplit(). > Although MRInputLegacy provide an API to get the InputSplit, it has few > issues: > * Without TezGroupedSplit.getWrapperSplit() it is unusable. > * Since it is used in various use cases, I propose to move it from > MRInputLegacy to MRInput. > * Currently the APIs are named as getNewInputSplit() and getOldInputSplit(). > These should be merged into one : getInputSplit(). The new/old API should be > handled internally. > Please give your feedback. -- This message was sent by Atlassian JIRA (v6.2#6252)