[
https://issues.apache.org/jira/browse/MAPREDUCE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828281#comment-13828281
]
Jason Lowe commented on MAPREDUCE-5635:
---------------------------------------
FileInputFormat does not require that the file is a plain text file broken into
lines with carriage-return or linefeed used as line delimiters. That's what
TextInputFormat is for.
FileInputFormat is an abstract class that makes no assumptions about how the
data in the file is formatted. Concrete implementations that derive from
FileInputFormat must implement the getRecordReader method which will dictate
how the records are read from the file and therefore what the format must be
for that particular input format.
> FileInputFormat does not specify how the file is split
> ------------------------------------------------------
>
> Key: MAPREDUCE-5635
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5635
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.2.0
> Environment: Does not matter.
> Reporter: Pranay Varma
>
> Here is what the TextInputFormat javadoc says:
> [TextInputFormat|http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html]
> An InputFormat for plain text files. Files are broken into lines. Either
> linefeed or carriage-return are used to signal end of line. Keys are the
> position in the file, and values are the line of text..
> FileInputFormat should say the same on
> [FileInputFormat|http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html]
--
This message was sent by Atlassian JIRA
(v6.1#6144)