[ 
https://issues.apache.org/jira/browse/BEAM-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733218#comment-16733218
 ] 

Alexey Romanenko commented on BEAM-5310:
----------------------------------------

[~kenn] The current status is the following (extract from my email to Dev ML):
 _"We added new module, called “hadoop-format”, which incorporates Read and 
Write parts for using Hadoop Mapreduce Format files. Old module 
“hadoop-input-format” keeps all public user API, but proxies all calls to new 
module, and will become deprecated starting from Beam 2.10. The implementation 
of “Read” part has moved into HadoopFormatIO and “Write" part was written from 
scratch. Unit tests are kept for both modules for the moment to guarantee that 
there is no regression._ 
  __ 
 _So, from the user perspective, everything should be as it was before, except 
that old IO becomes deprecated and the users have to migrate to new one after 
release 2.10._
  __ 
 _What is left to do:_
 _- Completely remove deprecated “hadoop-input-format” (at LTS or 3.0 
release?..)_
 _- Add new “hadoop-format” ITs to run on Jenkins."_

Since we still have running and working old {{HadoopInputFormatIOIT}} test on 
Jenkins, which actually tests new {{HadoopFormatIO.Read}} (by proxying calls, 
as I mentioned above) then I think missing a new IT on Jenkins is not a blocker 
for release and we can move this issue (BEAM-6246) to 2.11. Are you ok with 
that?

> Add support of HadoopOutputFormatIO
> -----------------------------------
>
>                 Key: BEAM-5310
>                 URL: https://issues.apache.org/jira/browse/BEAM-5310
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-hadoop
>            Reporter: Alexey Romanenko
>            Assignee: Alexey Romanenko
>            Priority: Minor
>             Fix For: 2.10.0
>
>
> For the moment, there is only {{HadoopInputFormatIO}} in Beam. To provide a 
> support of different writing IOs, that are not yet natively supported in Beam 
> (for example, Apache Orc or HBase bulk load), it would make sense to add 
> {{HadoopOutputFormatIO}} as well. It will incorporate support of batching and 
> streaming processing.
> After, {{HadoopInputFormatIO}} and {{HadoopOutputFormatIO}} should be merged 
> into one module, called {{HadoopFormatIO}}. Old {{HadoopInputFormatIO}} 
> should become deprecated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to