[
https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=402348&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402348
]
ASF GitHub Bot logged work on BEAM-9035:
----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Mar/20 17:51
Start Date: 12/Mar/20 17:51
Worklog Time Spent: 10m
Work Description: reuvenlax commented on issue #10413: [BEAM-9035] Typed
options for Row Schema and Field
URL: https://github.com/apache/beam/pull/10413#issuecomment-598334091
@alexvanboxel
1. I'm not entirely sure I understand the use case. Why do options need to
be copied, while other constructs (such as schemas) don't?
2. I disagree for a couple of reasons:
* A FieldType can be nullable, and there's no reason not to support null
FieldTypes in options. By returning null for non-existent fields, you make it
hard to distinguish between a null value and an invalid get call (i.e. passing
an option that is not part of the schema). This feels like the sort of behavior
that seems simpler, but actually makes things more complex (e.g. the fact that
Java Maps return null for non-existent values is often considered a mistaken
design, for this very reason. It's hard to distinguish between a missing value
and an explicit null stored in a map).
* Options are no different than schemas - there is a static list of
options for each field, so I don't see why this is so different.
I also want to understand better why we can't just use Row here (revisiting
Brian's question). You mentioned using dots in field names, but AFAIK protobuf
also prohibits dots. It's true that dots are often used in option
specifications in proto, but that's just to set individual fields of a message.
In Beam, we really should model those as a single row-valued option, not as
multiple individual options. I.e. (from the proto docs):
option (my_method_option).foo = 567;
option (my_method_option).bar = "Some string";
should translate to a single Beam option named "my_method_option" not to two
separate options.
dots are also used to represent package names of course. Is this the main
reason you need to support dots?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 402348)
Time Spent: 7h 20m (was: 7h 10m)
> BIP-1: Typed options for Row Schema and Fields
> ----------------------------------------------
>
> Key: BEAM-9035
> URL: https://issues.apache.org/jira/browse/BEAM-9035
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-java-core
> Reporter: Alex Van Boxel
> Assignee: Alex Van Boxel
> Priority: Major
> Fix For: 2.19.0
>
> Time Spent: 7h 20m
> Remaining Estimate: 0h
>
> This is the first issue of a multipart commit: this ticket implements the
> basic infrastructure of options on row and field.
> Full explanation:
> Introduce the concept of Options in Beam Schema’s to add extra context to
> fields and schema. In contracts to metadata, options would be added to
> fields, logical types and rows. In the options schema convertors can add
> options/annotations/decorators that were in the original schema, this context
> can be used in the rest of the pipeline for specific transformations or
> augment the end schema in the target output.
> Examples of options are:
> * informational: like the source of the data, ...
> * drive decisions further in the pipeline: flatten a row into another,
> rename a field, ...
> * influence something in the output: like cluster index, primary key, ...
> * logical type information
> And option is a key/typed value combination. The advantages of having the
> value types is:
> * Having strongly typed options would give a *portable way of Logical Types*
> to have structured information that could be shared over different languages.
> * This could keep the type intact when mapping from a formats that have
> strongly typed options (example: Protobuf).
> This is part of a multi ticket implementation. The following tickets are
> related:
> # Typed options for Row Schema and Fields
> # Convert Proto Options to Beam Schema options
> # Convert Avro extra information for Beam string options
> # Replace meta data with Logical Type options
> # Extract meta data in Calcite SQL to Beam options
> # Extract meta data in Zeta SQL to Beam options
> # Add java example of using option in a transform
> This feature is discussed with Reuven Lax, Brian Hulette
--
This message was sent by Atlassian Jira
(v8.3.4#803005)