Hi Eugene,

thanks for the update. I'm volunteer to tackle some those IOs (and make them conform with PTransform style guide). I'm pretty sure other people will jump on ;)

Regards
JB

On 04/08/2017 12:20 AM, Eugene Kirpichov wrote:
Hey all,

More progress has been made and we're nearing completion. ParDo, BigQueryIO
and Window are fixed; Map/FlatMapElements are in review.

The remaining unclaimed ones are all IOs of some form, and here's a list.
I've marked them all as "starter" in JIRA.

XML - https://issues.apache.org/jira/browse/BEAM-1914
TFRecordIO (Tensorflow) - https://issues.apache.org/jira/browse/BEAM-1913
KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
CountingInput - https://issues.apache.org/jira/browse/BEAM-1414

https://github.com/apache/beam/pull/2149 , which fixes BigQueryIO, is a
good model to follow when taking these on, as well as e.g.
https://github.com/apache/beam/pull/1927 (TextIO)

These are all actually easy to fix, but need volunteers (I do not have time
to fix all of these myself, but happy to be a reviewer - @jkff).
Let's finish this up in time for the first Beam stable release, so Beam's
stable API surface is consistent and polished!

On Fri, Mar 3, 2017 at 12:00 PM Eugene Kirpichov <kirpic...@google.com>
wrote:

It sounds like a great idea in general, though there are actually not that
many issues remaining - I fixed a bunch, and have PRs out for two more
(ParDo and BigQueryIO).
The remaining ones are:

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1355 <https://issues.apache.org/jira/browse/BEAM-1355> HDFS
IO should comply with PTransform style guide

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1402 <https://issues.apache.org/jira/browse/BEAM-1402> Make
TextIO and AvroIO use best-practice types.

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1414 <https://issues.apache.org/jira/browse/BEAM-1414> 
CountingInput
should comply with PTransform style guide

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1415 <https://issues.apache.org/jira/browse/BEAM-1415> PubsubIO
should comply with PTransfrom style guide

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1418 <https://issues.apache.org/jira/browse/BEAM-1418> 
MapElements
and FlatMapElements should comply with PTransform style guide

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1425 <https://issues.apache.org/jira/browse/BEAM-1425> Window
should comply with PTransform style guide

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1428 <https://issues.apache.org/jira/browse/BEAM-1428> KinesisIO
should comply with PTransform style guide

   - [image: Major - Major loss of function.]
   - OPEN


HDFS is pending the other mailing list thread.
For TextIO/AvroIO there's a pending PR by Reuven.
Window and Map/FlatMapElements could use a brief discussion on the mailing
list as to what the proper API should be.
KinesisIO is, I think, due for a rewrite.

However, CountingInput and PubsubIO I think are great targets for people
to pick up. Any volunteers?


On Fri, Mar 3, 2017 at 11:30 AM Robert Bradshaw
<rober...@google.com.invalid> wrote:

Here's a crazy idea: what if we had a virtual fixit/hackathon to knock
these out (similar to the virtual meet-up, but with an agenda)? I find
communal hacking sessions towards a common goal are a good way to get to
know each other and get a lot done. Would there be any interest in this?

On Wed, Mar 1, 2017 at 11:30 AM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

Hey all,

First couple rounds of fixes are in. Thanks Aviem Zur for contributing
TextIO fixes and Dan Halperin for reviewing! One more fix by Reuven in
progress (https://github.com/apache/beam/pull/1927).

Follow https://issues.apache.org/jira/browse/BEAM-1353 and sub-issues
for
the status.
Many of the changes are backwards-incompatible (though with only minor
changes in pipelines required). I'll make a separate announcement about
that on the users list now.

Call for help with the other sub-issues still stands! They are pretty
simple work items but pretty important since this is a blocker for
declaring stable BEAM API.

On Wed, Feb 8, 2017 at 3:51 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

Thanks Eugene.

I will tackle some Jira when back next week.

Regards
JB

On Feb 7, 2017, 18:16, at 18:16, Eugene Kirpichov
<kirpic...@google.com.INVALID> wrote:
Hey all,

I bit the bullet and audited all PTransform classes in Beam Java SDK
and
filed JIRA issues for all violations I could find.
I linked all them to the master JIRA issue
https://issues.apache.org/jira/browse/BEAM-1353

In general, all of these should be fixed before declaring Beam stable
API.
Would appreciate if some senior folks looked at the issues and
confirmed
that my suggested changes make sense.

PRs very welcome :) (though I'll be gone for the next few weeks so I
can't
review right now)
Many of these are very easy to fix (a few lines of code); some require
a
little more code, but as far as I can tell all of them are mechanical.


On Tue, Jan 31, 2017 at 4:10 PM Eugene Kirpichov <kirpic...@google.com>
wrote:

On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin
<dhalp...@google.com.invalid>
wrote:

On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

Hello,

The PTransform Style Guide is live
https://beam.apache.org/contribute/ptransform-style-guide/ - a
natural
next
step is to audit Beam libraries for compliance and file JIRAs for
places
that need to be fixed. It'd be great to finish these cleanups
before
declaring Beam stable API.

Please take a look and file JIRAs / post suggestions on this
thread!

I think it'll also make a great source of easy and useful work for
new
contributors.

Some things I remember off the top of my head:
- TextIO, KafkaIO use coders improperly - coders should not be used
as a
general-purpose byte parsing mechanism.


Can you say more about Kafka? Kafka actually exports byte[] by
default,
whereas Text files are String by default. So it does not seem nearly
as
egregious for Kafka as it is for Text.

Agreed that KafkaIO is less egregious, but it still has methods
withKeyCoder and withValueCoder - these should be replaced with
something
that doesn't take Coder.



- HadoopFileSource is not packaged as a PTransform
- Some connectors, e.g. KafkaIO, should use AutoValue for their
parameter
builders, but don't


Isn't AutoValue entirely an internal implementation detail that is
not
exposed(*) to users? I think this is irrelevant to a stable API.

Agreed - doesn't block stable API, but still a good thing to do
because it
makes the code cleaner (for KafkaIO there's a long-standing PR that
was
blocked on ratifying the style guide
https://github.com/apache/beam/pull/1048)



(*) except that it makes transforms not able to be final, which is a
regression.

I think AutoValue use should generally be considered *very* optional.
In
transforms I author, I prefer not to use AutoValue because it makes
the
code more complex and less readable.

Yeah, guidance on when to use / not use AutoValue could be improved.
I
think it makes a lot of sense when the transform has more than one or
two
parameters or when the set of parameters can grow.




- A few connectors improperly use
- Some transforms expose their transform type as "Something.Bound"
and
"Something.Unbound", e.g. TextIO.Read.Bound - such names are banned


"banned" is a strong word to use here. All of these are just
recommendations.

In general yes; the goal of the style guide is to be the default,
where if
you deviate from it, you should have a good reason. I don't think
there
ever exists a good reason to name a transform Something.Bound/Unbound
though.





I filed an umbrella JIRA
https://issues.apache.org/jira/browse/BEAM-1353
about
making existing Beam transforms comply with the guide - let's
crowdsource
this!

Thanks.








--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to