ParquetIO on S3 - I may confirm that it works only for “Write”, “Read" throws 
an exception:
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.IOException: 
can not read class org.apache.parquet.format.FileMetaData: java.io.IOException: 
Attempted read on closed stream.
Any ideas about the cause of this would be very welcomed.

ParquetIO on HDFS - works fine for me too (Write and Read).

WBR,
Alexey

> On 31 May 2018, at 00:46, Łukasz Gajowy <[email protected]> wrote:
> 
> Regarding ParquetIO on S3: I am investigating the issue. It seems that it 
> never worked on s3 (I didn't expect that). Currently, I'm trying to 
> understand why it behaves differently than on other filesystems (HDFS, 
> local). Any help appreciated.
> 
> Regarding ParquetIO on HDFS: I was able to run it on my machine successfully. 
> I also created a PR with HDFS Performance test for Parquet (and it is passing 
> too): https://github.com/apache/beam/pull/5520 
> <https://github.com/apache/beam/pull/5520>. Hope this will be helpful!
> 
> Best regards, 
> Łukasz 
> 
> 
> 
> 2018-05-31 0:41 GMT+02:00 Robert Bradshaw <[email protected] 
> <mailto:[email protected]>>:
> On Wed, May 30, 2018 at 12:59 PM Ahmet Altay <[email protected] 
> <mailto:[email protected]>> wrote:
> Thank you JB.
> 
> For clarification, are you referring to the following items:
> - RabbitMqIO - https://github.com/apache/beam/pull/1729 
> <https://github.com/apache/beam/pull/1729>
> -  ParquetIO on HDFS/S3 - https://issues.apache.org/jira/browse/BEAM-4421 
> <https://issues.apache.org/jira/browse/BEAM-4421>
> 
> If the above mapping is correct, could we separate addition of new feature 
> from addressing blocking issues? I would propose that we do not block the 
> release for the former one and fix the latter one before the release.
> 
> On Tue, May 29, 2018 at 10:26 PM, Jean-Baptiste Onofré <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi,
> 
> I would like to merge RabbitMqIO (we are doing the final touches) and we
> have an issue about ParquetIO on HDFS/S3 that I would like to
> investigate with the team.
> 
> Do you know who is currently investigating the ParquetIO issue? Do you need 
> help with that?
> 
> Do we know if this is a regression, or has it never worked? 
>  
> I plan to start the release process asap, hopefully later today.
> 
> That would be great. A lot has happened since the last release [1] and we've 
> had a pretty good cadence so far in 2018 so it'd be nice to get this out in 
> to the hands of our users. And thanks for volunteering to do the release! 
> 
> - Robert
> 
> 
> [1] https://github.com/apache/beam/compare/release-2.4.0...master 
> <https://github.com/apache/beam/compare/release-2.4.0...master>
> 
> 
>  
> 
> Regards
> JB
> 
> On 29/05/2018 23:00, Ahmet Altay wrote:
> > Thank you JB for the update. Could we start the release process now? Is
> > there anyway I could help with moving the release forward?
> > 
> > On Fri, May 25, 2018 at 8:19 AM, Lukasz Cwik <[email protected] 
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> > 
> >     Thanks for the update JB.
> > 
> >     Kenn, we have the post commit integration tests which run against
> >     shaded artifacts like validates runner. We also have the nightly
> >     snapshot and its verification run which validates the nightly
> >     snapshot with DirectRunner / Dataflow / Apex / Spark / Flink for
> >     WordCount and DirectRunner / Dataflow for the mobile gaming examples.
> > 
> >     I'm not sure about the IOs and whether the perfkit benchmark work
> >     adequately covers them.
> > 
> > 
> >     On Fri, May 25, 2018 at 1:28 AM Jean-Baptiste Onofré
> >     <[email protected] <mailto:[email protected]> <mailto:[email protected] 
> > <mailto:[email protected]>>> wrote:
> > 
> >         Hi Luke,
> > 
> >         I tested the following build:
> > 
> >         ./gradlew publishToMavenLocal -PisRelease --no-parallel
> > 
> >         The artifacts are present in my .m2/repository.
> > 
> >         For instance, I can see:
> > 
> >         .m2/repository/org/apache/beam/beam-sdks-java-core/2.5.0$ ls -l
> >         total 16256
> >          beam-sdks-java-core-2.5.0.jar
> >          beam-sdks-java-core-2.5.0.jar.asc
> >          beam-sdks-java-core-2.5.0-javadoc.jar
> >          beam-sdks-java-core-2.5.0-javadoc.jar.asc
> >          beam-sdks-java-core-2.5.0.pom
> >          beam-sdks-java-core-2.5.0.pom.asc
> >          beam-sdks-java-core-2.5.0-sources.jar
> >          beam-sdks-java-core-2.5.0-sources.jar.asc
> >          beam-sdks-java-core-2.5.0-tests.jar
> >          beam-sdks-java-core-2.5.0-tests.jar.asc
> >          beam-sdks-java-core-2.5.0-test-sources.jar
> >          beam-sdks-java-core-2.5.0-test-sources.jar.asc
> > 
> >         1. The signatures are OK:
> > 
> >         gpg --verify beam-sdks-java-core-2.5.0.jar.asc
> >         beam-sdks-java-core-2.5.0.jar
> >         gpg: Signature made jeu. 24 mai 2018 16:55:11 CEST
> >         gpg:                using RSA key
> >         1AA8CF92D409A73393D0B736BFF2EE42C8282E76
> >         gpg: Good signature from "Jean-Baptiste Onofré
> >         <[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>>"
> >         [unknown]
> > 
> >         2. The pom looks correct to me but it's not optimal because
> > 
> >         2.1. There's no parent definition, so each pom duplicate the same
> >         configurations (like scm, license, etc)
> >         2.2. There's no Maven plugin configuration, even if it's not
> >         used for
> >         the build, other tools can parse and use plugin configuration
> >         (like the
> >         source/target version, etc).
> > 
> >         So, even if it's not optimal, the pom looks overall good.
> > 
> >         I think it makes sense to move forward on the release as it is
> >         right now.
> > 
> >         If there's no objection, I will start the release process during the
> >         week end.
> > 
> >         By the way, it would be good to verify that the Maven build is still
> >         working. Ismaël and I fixed new issues on the Maven build.
> >         At some point, after the 2.5.0 release, we have to state to
> >         remove the
> >         Maven build (after a vote ;)).
> > 
> >         Thanks,
> >         Regards
> >         JB
> > 
> > 
> >         On 25/05/2018 01:34, Lukasz Cwik wrote:
> >         > The license inclusion issue that was brought up on the thread
> >         has been
> >         > resolved https://issues.apache.org/jira/browse/BEAM-4393 
> > <https://issues.apache.org/jira/browse/BEAM-4393>
> >         <https://issues.apache.org/jira/browse/BEAM-4393 
> > <https://issues.apache.org/jira/browse/BEAM-4393>>.
> >         >
> >         > JB, you find any other release related issues?
> >         >
> >         > On Fri, May 18, 2018 at 10:33 AM Lukasz Cwik <[email protected] 
> > <mailto:[email protected]>
> >         <mailto:[email protected] <mailto:[email protected]>>
> >         > <mailto:[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>>> wrote:
> >         >
> >         >     I believe JB is referring
> >         >     to https://issues.apache.org/jira/browse/BEAM-4060 
> > <https://issues.apache.org/jira/browse/BEAM-4060>
> >         <https://issues.apache.org/jira/browse/BEAM-4060 
> > <https://issues.apache.org/jira/browse/BEAM-4060>>
> >         >
> >         >     On Fri, May 18, 2018 at 10:16 AM Scott Wegner
> >         <[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>
> >         >     <mailto:[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>>>
> >         wrote:
> >         >
> >         >         J.B., can you give any context on what metadata is
> >         missing? Is
> >         >         there a JIRA?
> >         >
> >         >         On Thu, May 17, 2018 at 9:30 PM Jean-Baptiste Onofré
> >         >         <[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>
> >         <mailto:[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>>> wrote:
> >         >
> >         >             Hi,
> >         >
> >         >             The build was OK  yesterday but the maven-metadata
> >         is still
> >         >             missing.
> >         >
> >         >             That's the point to  fix before being able to move
> >         forward
> >         >             on  the release.
> >         >
> >         >             I  gonna tackle this later today.
> >         >
> >         >             Regards
> >         >             JB
> >         >
> >         >             On 05/18/2018 02:41 AM, Ahmet Altay wrote:
> >         >             > Hi JB and all,
> >         >             >
> >         >             > I wanted to follow up on my previous email. The
> >         python
> >         >             streaming issue I
> >         >             > mentioned is resolved and removed from the
> >         blocker list.
> >         >             Blocker list is empty
> >         >             > now. You can go ahead with the release branch
> >         cut when you
> >         >             are ready.
> >         >             >
> >         >             > Thank you,
> >         >             > Ahmet
> >         >             >
> >         >             >
> >         >             > On Sun, May 13, 2018 at 8:43 AM, Jean-Baptiste
> >         Onofré
> >         >             <[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>
> >         <mailto:[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>>
> >         >             > <mailto:[email protected] 
> > <mailto:[email protected]> <mailto:[email protected] 
> > <mailto:[email protected]>>
> >         <mailto:[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>>>> wrote:
> >         >             >
> >         >             >     Hi guys,
> >         >             >
> >         >             >     just to let you know that the build fully
> >         passed on my
> >         >             box.
> >         >             >
> >         >             >     I'm testing the artifacts right now.
> >         >             >
> >         >             >     Regards
> >         >             >     JB
> >         >             >
> >         >             >     On 06/04/2018 10:48, Jean-Baptiste Onofré wrote:
> >         >             >
> >         >             >         Hi guys,
> >         >             >
> >         >             >         Apache Beam 2.4.0 has been released on
> >         March 20th.
> >         >             >
> >         >             >         According to our cycle of release (roughly 6
> >         >             weeks), we should think
> >         >             >         about 2.5.0.
> >         >             >
> >         >             >         I'm volunteer to tackle this release.
> >         >             >
> >         >             >         I'm proposing the following items:
> >         >             >
> >         >             >         1. We start the Jira triage now, up to
> >         Tuesday
> >         >             >         2. I would like to cut the release on
> >         Tuesday
> >         >             night (Europe time)
> >         >             >         2bis. I think it's wiser to still use
> >         Maven for
> >         >             this release. Do you
> >         >             >         think we
> >         >             >         will be ready to try a release with Gradle ?
> >         >             >
> >         >             >         After this release, I would like a
> >         discussion about:
> >         >             >         1. Gradle release (if we release 2.5.0
> >         with Maven)
> >         >             >         2. Isolate release cycle per Beam part.
> >         I think it
> >         >             would be interesting
> >         >             >         to have
> >         >             >         different release cycle: SDKs, DSLs,
> >         Runners, IOs.
> >         >             That's another
> >         >             >         discussion, I
> >         >             >         will start a thread about that.
> >         >             >
> >         >             >         Thoughts ?
> >         >             >
> >         >             >         Regards
> >         >             >         JB
> >         >             >
> >         >             >
> >         >
> >         >             --
> >         >             Jean-Baptiste Onofré
> >         >             [email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>
> >         <mailto:[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>>
> >         >             http://blog.nanthrax.net <http://blog.nanthrax.net/>
> >         >             Talend - http://www.talend.com 
> > <http://www.talend.com/>
> >         >
> > 
> >         -- 
> >         --
> >         Jean-Baptiste Onofré
> >         [email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>
> >         http://blog.nanthrax.net <http://blog.nanthrax.net/>
> >         Talend - http://www.talend.com <http://www.talend.com/>
> > 
> > 
> 
> -- 
> --
> Jean-Baptiste Onofré
> [email protected] <mailto:[email protected]>
> http://blog.nanthrax.net <http://blog.nanthrax.net/>
> Talend - http://www.talend.com <http://www.talend.com/>
> 
> 

Reply via email to