[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.

ASF GitHub Bot (JIRA) Wed, 04 Feb 2015 12:48:14 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305914#comment-14305914
 ]


ASF GitHub Bot commented on FLINK-1396:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/363#discussion_r24117856
  
    --- Diff: docs/hadoop_compatibility.md ---
    @@ -52,56 +63,70 @@ Add the following dependency to your `pom.xml` to use 
the Hadoop Compatibility L
     
     ### Using Hadoop Data Types
     
    -Flink supports all Hadoop `Writable` and `WritableComparable` data types 
out-of-the-box. You do not need to include the Hadoop Compatibility dependency, 
if you only want to use your Hadoop data types. See the [Programming 
Guide](programming_guide.html#data-types) for more details.
    +Flink supports all Hadoop `Writable` and `WritableComparable` data types
    +out-of-the-box. You do not need to include the Hadoop Compatibility 
dependency,
    +if you only want to use your Hadoop data types. See the
    +[Programming Guide](programming_guide.html#data-types) for more details.
     
     ### Using Hadoop InputFormats
     
    -Flink provides a compatibility wrapper for Hadoop `InputFormats`. Any 
class that implements `org.apache.hadoop.mapred.InputFormat` or extends 
`org.apache.hadoop.mapreduce.InputFormat` is supported. Thus, Flink can handle 
Hadoop built-in formats such as `TextInputFormat` as well as external formats 
such as Hive's `HCatInputFormat`. Data read from Hadoop InputFormats is 
converted into a `DataSet<Tuple2<KEY,VALUE>>` where `KEY` is the key and 
`VALUE` is the value of the original Hadoop key-value pair.
    -
    -Flink's InputFormat wrappers are 
    -
    -- `org.apache.flink.hadoopcompatibility.mapred.HadoopInputFormat` and 
    -- `org.apache.flink.hadoopcompatibility.mapreduce.HadoopInputFormat`
    +Hadoop input formats can be used to create a data source by using
    +on of the methods `readHadoopFile` or `createHadoopInput` of the
    +`ExecutionEnvironment`. The former is used for input formats derived
    +from `FileInputFormat` while the latter has to be used for general purpose
    +input formats.
     
    -and can be used as regular Flink 
[InputFormats](programming_guide.html#data-sources).
    +The resulting `DataSet` contains 2-tuples where the first field
    +is the key and the second field is the value retrieved from the Hadoop
    +InputFormat.
     
     The following example shows how to use Hadoop's `TextInputFormat`.
     
    +<div class="codetabs" markdown="1">
    +<div data-lang="java" markdown="1">
    +
     ~~~java
     ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    -           
    -// Set up the Hadoop TextInputFormat.
    -Job job = Job.getInstance();
    -HadoopInputFormat<LongWritable, Text> hadoopIF = 
    -  // create the Flink wrapper.
    -  new HadoopInputFormat<LongWritable, Text>(
    -    // create the Hadoop InputFormat, specify key and value type, and job.
    -    new TextInputFormat(), LongWritable.class, Text.class, job
    -  );
    -TextInputFormat.addInputPath(job, new Path(inputPath));
    -           
    -// Read data using the Hadoop TextInputFormat.
    -DataSet<Tuple2<LongWritable, Text>> text = env.createInput(hadoopIF);
    +
    +DataSet<Tuple2<LongWritable, Text>> input =
    +    env.readHadoopFile(new TextInputFormat(), LongWritable.class, 
Text.class, textPath);
     
     // Do something with the data.
     [...]
     ~~~
     
    -### Using Hadoop OutputFormats
    +</div>
    +<div data-lang="scala" markdown="1">
     
    -Flink provides a compatibility wrapper for Hadoop `OutputFormats`. Any 
class that implements `org.apache.hadoop.mapred.OutputFormat` or extends 
`org.apache.hadoop.mapreduce.OutputFormat` is supported. The OutputFormat 
wrapper expects its input data to be a `DataSet<Tuple2<KEY,VALUE>>` where `KEY` 
is the key and `VALUE` is the value of the Hadoop key-value pair that is 
processed by the Hadoop OutputFormat.
    +~~~scala
    +val env = ExecutionEnvironment.getExecutionEnvironment
    +           
    +val input: DataSet[(LongWritable, Text)] =
    +  env.readHadoopFile(new TextInputFormat, classOf[LongWritable], 
classOf[Text], textPath)
     
    -Flink's OUtputFormat wrappers are
    +// Do something with the data.
    +[...]
    +~~~
    +
    +</div>
     
    -- `org.apache.flink.hadoopcompatibility.mapred.HadoopOutputFormat` and 
    -- `org.apache.flink.hadoopcompatibility.mapreduce.HadoopOutputFormat`
    +</div>
    +
    +### Using Hadoop OutputFormats
     
    -and can be used as regular Flink 
[OutputFormats](programming_guide.html#data-sinks).
    +Flink provides a compatibility wrapper for Hadoop `OutputFormats`. Any 
class
    +that implements `org.apache.hadoop.mapred.OutputFormat` or extend
    --- End diff --
    
    extend -> extends


> Add hadoop input formats directly to the user API.
> --------------------------------------------------
>
>                 Key: FLINK-1396
>                 URL: https://issues.apache.org/jira/browse/FLINK-1396
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Robert Metzger
>            Assignee: Aljoscha Krettek
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.

Reply via email to