[
https://issues.apache.org/jira/browse/BEAM-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismaël Mejía updated BEAM-1592:
-------------------------------
Component/s: io-java-hadoop-file-system
> Unify HdfsIO and HadoopInputFormatIO
> ------------------------------------
>
> Key: BEAM-1592
> URL: https://issues.apache.org/jira/browse/BEAM-1592
> Project: Beam
> Issue Type: Bug
> Components: io-java-hadoop-file-system, io-java-hadoop-format
> Reporter: Stephen Sisk
> Priority: Major
> Fix For: Not applicable
>
>
> HIFIO is currently in PR (https://github.com/apache/beam/pull/1994) and as
> per discussion in
> https://lists.apache.org/thread.html/803857877804165e798cf31edf079e6603eb9682b7690d52124c31e7@%3Cdev.beam.apache.org%3E,
> we'd like to check HIFIO in as-is, then unify the two since they share a lot
> of code.
> [[email protected]] has mentioned: "the FileInputFormat reader gets to
> call some special APIs that the
> generic InputFormat reader cannot -- so they are not completely redundant.
> Specifically, FileInputFormat reader can do size-based splitting."
> Dan recommended: "See if we can "inline" the FileInputFormat specific parts
> of HdfsIO inside of HadoopInputFormatIO via reflection. If so, we can get the
> best of both worlds with shared code."
> This seems reasonable to me.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)