Hi I have some questions on this: how hierarchic schemas would work? Seems it is not really supported by the ecosystem (out of custom stuff) :(. How would it integrate smoothly with other generic record types - N bridges?
Concretely I wonder if using json API couldnt be beneficial: json-p is a nice generic abstraction with a built in querying mecanism (jsonpointer) but no actual serialization (even if json and binary json are very natural). The big advantage is to have a well known ecosystem - who doesnt know json today? - that beam can reuse for free: JsonObject (guess we dont want JsonValue abstraction) for the record type, jsonschema standard for the schema, jsonpointer for the delection/projection etc... It doesnt enforce the actual serialization (json, smile, avro, ...) but provide an expressive and alread known API so i see it as a big win-win for users (no need to learn a new API and use N bridges in all ways) and beam (impls are here and API design already thought). Wdyt? Le 29 janv. 2018 06:24, "Jean-Baptiste Onofré" <j...@nanthrax.net> a écrit : > Hi Reuven, > > Thanks for the update ! As I'm working with you on this, I fully agree and > great > doc gathering the ideas. > > It's clearly something we have to add asap in Beam, because it would allow > new > use cases for our users (in a simple way) and open new areas for the > runners > (for instance dataframe support in the Spark runner). > > By the way, while ago, I created BEAM-3437 to track the PoC/PR around this. > > Thanks ! > > Regards > JB > > On 01/29/2018 02:08 AM, Reuven Lax wrote: > > Previously I submitted a proposal for adding schemas as a first-class > concept on > > Beam PCollections. The proposal engendered quite a bit of discussion > from the > > community - more discussion than I've seen from almost any of our > proposals to > > date! > > > > Based on the feedback and comments, I reworked the proposal document > quite a > > bit. It now talks more explicitly about the different between dynamic > schemas > > (where the schema is not fully not know at graph-creation time), and > static > > schemas (which are fully know at graph-creation time). Proposed APIs are > more > > fleshed out now (again thanks to feedback from community members), and > the > > document talks in more detail about evolving schemas in long-running > streaming > > pipelines. > > > > Please take a look. I think this will be very valuable to Beam, and > welcome any > > feedback. > > > > https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ12pHG > K0QIvXS1FOTgRc/edit# > > > > Reuven > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >