Thanks!
On Thu, Nov 30, 2017 at 11:25 AM, Holden Karau <[email protected]> wrote: > Rocking, I'll start leaving some comments on this. I'm excited to see work > being done in this area as well :) > > On Thu, Nov 30, 2017 at 9:20 AM, Tyler Akidau <[email protected]> wrote: > >> On Wed, Nov 29, 2017 at 6:38 PM Reuven Lax <[email protected]> wrote: >> >>> There has been a lot of conversation about schemas on PCollections >>> recently. There are a number of reasons for this. Schemas as first-class >>> objects in Beam provide a nice base for building BeamSQL. Spark has >>> provided schema-support via Dataframes for over two years, and it has >>> proved to be very popular among Spark users; it turns out that FlumeJava - >>> the original inspiration for the Beam API - has had schema support for even >>> longer, though this feature was not included in the Beam (at that time >>> Dataflow) API. It turns out that most records have structure, and allowing >>> the system to understand record structure can both simplify usage of the >>> system and allow for new performance optimizations. >>> >>> After discussion with JB, Eugene, Kenn, Robert, and a number of others >>> on the list, I've started a proposal document here >>> <https://docs.google.com/document/d/1tnG2DPHZYbsomvihIpXruUmQ12pHGK0QIvXS1FOTgRc/edit?usp=sharing> >>> describing how schemas can be added to Beam in a manner that integrates >>> with the existing Beam API. The goal is not blindly copy existing systems >>> that have schemas, but rather to ensure that we get the best fit for Beam. >>> Please comment on this proposal - as much feedback as possible is valuable. >>> >>> In addition, you may notice this document is incomplete. While it does >>> sketch out how schemas can fit into Beam semantically, many portions of >>> this design remain to be fleshed out. In particular, the API signatures are >>> only sketched at at a high level, exactly what all these APIs will look >>> like has not yet been defined. I would welcome help from interested members >>> of the community to define these APIs, and to make sure we're covering all >>> relevant use cases. >>> >> >> Thanks for sharing this Reuven, I'm excited to see this being discussed. >> One global comment: all of the existing examples are in Java. It would be >> great if we could design this with Python in mind (and how it could >> interact cleanly with Pandas) at the same time. +Robert Bradshaw >> <[email protected]> , +Holden Karau <[email protected]> , and +Ahmet >> Altay <[email protected]> , all whom I've spoken with regarding this and >> other Python things recently, just to be sure they see it. But of course >> it'd be great if anyone working on Python could jump in. >> >> -Tyler >> >> >> >>> >>> Thanks all, >>> >>> Reuven >>> >>> >>> > > > -- > Twitter: https://twitter.com/holdenkarau >
