Till Rohrmann created FLINK-2692:
------------------------------------
Summary: Untangle CsvInputFormat into PojoTypeCsvInputFormat and
TupleTypeCsvInputFormat
Key: FLINK-2692
URL: https://issues.apache.org/jira/browse/FLINK-2692
Project: Flink
Issue Type: Improvement
Reporter: Till Rohrmann
Priority: Minor
The {{CsvInputFormat}} currently allows to return values as a {{Tuple}} or a
{{Pojo}} type. As a consequence, the processing logic, which has to work for
both types, is overly complex. For example, the {{CsvInputFormat}} contains
fields which are only used when a Pojo is returned. Moreover, the pojo field
information are constructed by calling setter methods which have to be called
in a very specific order, otherwise they fail. E.g. one first has to call
{{setFieldTypes}} before calling {{setOrderOfPOJOFields}}, otherwise the number
of fields might be different. Furthermore, some of the methods can only be
called if the return type is a {{Pojo}} type, because they expect that a
{{PojoTypeInfo}} is present.
I think the {{CsvInputFormat}} should be refactored to make the code more
easily maintainable. I propose to split it up into a {{PojoTypeCsvInputFormat}}
and a {{TupleTypeCsvInputFormat}} which take all the required information via
their constructors instead of using the {{setFields}} and
{{setOrderOfPOJOFields}} approach.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)