[ 
https://issues.apache.org/jira/browse/FLINK-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053836#comment-14053836
 ] 

ASF GitHub Bot commented on FLINK-933:
--------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/47#discussion_r14607144
  
    --- Diff: 
stratosphere-java/src/main/java/eu/stratosphere/api/java/io/PrimitiveInputFormat.java
 ---
    @@ -0,0 +1,73 @@
    
+/***********************************************************************************************************************
    + *
    + * Copyright (C) 2010-2013 by the Stratosphere project 
(http://stratosphere.eu)
    + *
    + * Licensed under the Apache License, Version 2.0 (the "License"); you may 
not use this file except in compliance with
    + * the License. You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software 
distributed under the License is distributed on
    + * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either 
express or implied. See the License for the
    + * specific language governing permissions and limitations under the 
License.
    + *
    + 
**********************************************************************************************************************/
    +package eu.stratosphere.api.java.io;
    +
    +import eu.stratosphere.api.common.io.DelimitedInputFormat;
    +import eu.stratosphere.core.fs.FileInputSplit;
    +import eu.stratosphere.core.fs.Path;
    +import eu.stratosphere.types.parser.FieldParser;
    +import eu.stratosphere.util.InstantiationUtil;
    +
    +import java.io.IOException;
    +
    +/**
    + * An input format that reads single field primitive data from a given 
file. The difference between this and
    + * {@link eu.stratosphere.api.java.io.CsvInputFormat} is that it won't go 
through {@link eu.stratosphere.api.java.tuple.Tuple1}.
    + */
    +public class PrimitiveInputFormat<OT> extends DelimitedInputFormat<OT> {
    +
    +   private Class<OT> primitiveClass;
    +
    +   private static final byte CARRIAGE_RETURN = (byte) '\r';
    +
    +   private static final byte NEW_LINE = (byte) '\n';
    +
    +   private transient FieldParser<OT> parser;
    +
    +
    +   public PrimitiveInputFormat(Path filePath, Class<OT> primitiveClass) {
    +           super(filePath);
    +           this.primitiveClass = primitiveClass;
    +   }
    +
    +   public PrimitiveInputFormat(Path filePath, char delimiter, Class<OT> 
primitiveClass) {
    +           super(filePath);
    +           this.primitiveClass = primitiveClass;
    +           this.setDelimiter(delimiter);
    +   }
    +
    +   @Override
    +   public void open(FileInputSplit split) throws IOException {
    +           super.open(split);
    +           Class<? extends FieldParser<OT>> parserType = 
FieldParser.getParserForType(primitiveClass);
    +           if (parserType == null) {
    +                   throw new IllegalArgumentException("The type '" + 
primitiveClass.getName() + "' is not supported for the primitive input 
format.");
    +           }
    +           parser = InstantiationUtil.instantiate(parserType, 
FieldParser.class);
    +   }
    +
    +   @Override
    +   public OT readRecord(OT reuse, byte[] bytes, int offset, int numBytes) {
    +           //Check if \n is used as delimiter and the end of this line is 
a \r, then remove \r from the line
    +           if (this.getDelimiter() != null && this.getDelimiter().length 
== 1
    +                   && this.getDelimiter()[0] == NEW_LINE && 
offset+numBytes >= 1
    +                   && bytes[offset+numBytes-1] == CARRIAGE_RETURN){
    +                   numBytes -= 1;
    +           }
    +
    +           parser.parseField(bytes, offset, numBytes + offset, (char) 
this.getDelimiter()[0], reuse);
    --- End diff --
    
    If you check in line 64 whether this.getDelimiter() is not null, I think we 
should check it here as well. But actually, the DelimitedInputFormat checks 
upon setting the delimiter whether it is null. So it should be safe to leave 
the null check in line 64 out.


> Add an input format to read primitive types directly (not through tuples)
> -------------------------------------------------------------------------
>
>                 Key: FLINK-933
>                 URL: https://issues.apache.org/jira/browse/FLINK-933
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Stephan Ewen
>            Assignee: Mingliang Qi
>            Priority: Minor
>              Labels: easyfix, features, starter
>
> Right now, reading primitive types goes either through custom formats (work 
> intensive), or through CSV inputs. The latter return tuples.
> To read a sequence of primitives, you need to go though Tuple1, which is 
> clumsy.
> I would suggest to add an input format to read primitive types line wise (or 
> otherwise delimited), and also add a method to the environment for that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to