[
https://issues.apache.org/jira/browse/BEAM-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733218#comment-16733218
]
Alexey Romanenko commented on BEAM-5310:
----------------------------------------
[~kenn] The current status is the following (extract from my email to Dev ML):
_"We added new module, called “hadoop-format”, which incorporates Read and
Write parts for using Hadoop Mapreduce Format files. Old module
“hadoop-input-format” keeps all public user API, but proxies all calls to new
module, and will become deprecated starting from Beam 2.10. The implementation
of “Read” part has moved into HadoopFormatIO and “Write" part was written from
scratch. Unit tests are kept for both modules for the moment to guarantee that
there is no regression._
__
_So, from the user perspective, everything should be as it was before, except
that old IO becomes deprecated and the users have to migrate to new one after
release 2.10._
__
_What is left to do:_
_- Completely remove deprecated “hadoop-input-format” (at LTS or 3.0
release?..)_
_- Add new “hadoop-format” ITs to run on Jenkins."_
Since we still have running and working old {{HadoopInputFormatIOIT}} test on
Jenkins, which actually tests new {{HadoopFormatIO.Read}} (by proxying calls,
as I mentioned above) then I think missing a new IT on Jenkins is not a blocker
for release and we can move this issue (BEAM-6246) to 2.11. Are you ok with
that?
> Add support of HadoopOutputFormatIO
> -----------------------------------
>
> Key: BEAM-5310
> URL: https://issues.apache.org/jira/browse/BEAM-5310
> Project: Beam
> Issue Type: Improvement
> Components: io-java-hadoop
> Reporter: Alexey Romanenko
> Assignee: Alexey Romanenko
> Priority: Minor
> Fix For: 2.10.0
>
>
> For the moment, there is only {{HadoopInputFormatIO}} in Beam. To provide a
> support of different writing IOs, that are not yet natively supported in Beam
> (for example, Apache Orc or HBase bulk load), it would make sense to add
> {{HadoopOutputFormatIO}} as well. It will incorporate support of batching and
> streaming processing.
> After, {{HadoopInputFormatIO}} and {{HadoopOutputFormatIO}} should be merged
> into one module, called {{HadoopFormatIO}}. Old {{HadoopInputFormatIO}}
> should become deprecated.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)