[jira] [Commented] (MAPREDUCE-5635) FileInputFormat does not specify how the file is split

Jason Lowe (JIRA) Wed, 20 Nov 2013 15:28:15 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828281#comment-13828281
 ]


Jason Lowe commented on MAPREDUCE-5635:
---------------------------------------

FileInputFormat does not require that the file is a plain text file broken into 
lines with carriage-return or linefeed used as line delimiters.  That's what 
TextInputFormat is for.

FileInputFormat is an abstract class that makes no assumptions about how the 
data in the file is formatted.  Concrete implementations that derive from 
FileInputFormat must implement the getRecordReader method which will dictate 
how the records are read from the file and therefore what the format must be 
for that particular input format.

> FileInputFormat does not specify how the file is split
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-5635
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5635
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>         Environment: Does not matter.
>            Reporter: Pranay Varma
>
> Here is what the TextInputFormat javadoc says:
> [TextInputFormat|http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html]
> An InputFormat for plain text files. Files are broken into lines. Either 
> linefeed or carriage-return are used to signal end of line. Keys are the 
> position in the file, and values are the line of text..
> FileInputFormat should say the same on
> [FileInputFormat|http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html]



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5635) FileInputFormat does not specify how the file is split

Reply via email to