Stephen Sisk created BEAM-1592:
----------------------------------
Summary: Unify HdfsIO and HadoopInputFormatIO
Key: BEAM-1592
URL: https://issues.apache.org/jira/browse/BEAM-1592
Project: Beam
Issue Type: Bug
Components: sdk-java-core
Reporter: Stephen Sisk
Assignee: Davor Bonaci
HIFIO is currently in PR (https://github.com/apache/beam/pull/1994) and as per
discussion in
https://lists.apache.org/thread.html/803857877804165e798cf31edf079e6603eb9682b7690d52124c31e7@%3Cdev.beam.apache.org%3E,
we'd like to check HIFIO in as-is, then unify the two since they share a lot
of code.
[[email protected]] has mentioned: "the FileInputFormat reader gets to call
some special APIs that the
generic InputFormat reader cannot -- so they are not completely redundant.
Specifically, FileInputFormat reader can do size-based splitting."
Dan recommended: "See if we can "inline" the FileInputFormat specific parts of
HdfsIO inside of HadoopInputFormatIO via reflection. If so, we can get the best
of both worlds with shared code."
This seems reasonable to me.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)