Agree, it's what I meant by "core transforms".
Regards
JB
On 12/19/2017 11:18 AM, Reuven Lax wrote:
Keep in mind that today Avro is one of the most common coders used for user data
types, not just for file IO. The reason for this is that it's the easiest way to
get a coder for a users POJO - you simply annotate the POJO with
@DefaultCoder(AvroCoder.class), and it works. This is the coder used for all
internal shuffles (e.g. GroupByKey).
I would argue that most users don't really care about Avro for this use case,
what they really want is a way of saying "make this POJO work" and Avro is the
only way we give them. This was part of my argument in the schema docs. However
the status quo is that they use Avro here.
Reuven
On Tue, Dec 19, 2017 at 1:32 AM, Jean-Baptiste Onofré <[email protected]
<mailto:[email protected]>> wrote:
Hi Romain,
it sounds good to me. I think any format should be packaged as an extension.
The only point is that some core transforms expect specific format, so, it
means that users will have to remember to add the avro extension to use some
transforms (or the transforms could be an extension as well). I have to
check the transforms working like this.
Regards
JB
On 12/19/2017 10:26 AM, Romain Manni-Bucau wrote:
Hi guys,
checking security issues of the project I'm responsible of (which
integrates beam) I realized the java sdk core module depends on avro. On
security point of view it is a blocker cause of the legacy avro brings
(jackson from codehaus etc) but all that can be fixed. However I would
like to take this opportunity to open the topic of avro in the core
dependencies.
From my point of view it doesn't make much sense cause it is just one
of the serialization you can use with the file IO and it is highly not
probable all the potential formats are imported in the core. Since it is
a very local usage and not a core feature I think it should be extracted
- we can discuss extracting the actual transforms from the core in
another thread, it would make a lot of sense IMHO but not the current
topic.
Therefore I'd like to propose to extract avro format - like others - in
an extension and remove it as a hard requirement of the core to bring
more consistency and modularity to beam.
Wdyt?
Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau
<https://twitter.com/rmannibucau>> | Blog
<https://rmannibucau.metawerx.net/
<https://rmannibucau.metawerx.net/>> | Old Blog
<http://rmannibucau.wordpress.com <http://rmannibucau.wordpress.com>> |
Github <https://github.com/rmannibucau
<https://github.com/rmannibucau>> | LinkedIn
<https://www.linkedin.com/in/rmannibucau
<https://www.linkedin.com/in/rmannibucau>>
--
Jean-Baptiste Onofré
[email protected] <mailto:[email protected]>
http://blog.nanthrax.net
Talend - http://www.talend.com
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com