[
https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=403545&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-403545
]
ASF GitHub Bot logged work on BEAM-9035:
----------------------------------------
Author: ASF GitHub Bot
Created on: 15/Mar/20 05:32
Start Date: 15/Mar/20 05:32
Worklog Time Spent: 10m
Work Description: reuvenlax commented on issue #10413: [BEAM-9035] Typed
options for Row Schema and Field
URL: https://github.com/apache/beam/pull/10413#issuecomment-599170232
Some thoughts:
I know your use case for options involves mapping specific input types to
schemas, however we need to make sure that the actual mechanism is more generic
than that. We have many uses for field metadata (or options) inside of Beam,
completely unrelated to any option concept on input data types. I would not
like to make a design choice that precludes nullable FieldTypes now, because we
may very well find that we need nullable field types.
That being said: I notice that you haven't removed the FieldType metadata in
this PR. Is that planned for a future PR? I don't think we want both metadata
and options in the protocol - Options seemed to me to be a better-typed version
of metadata.
I don't personally think that the hasOption code is too bad, and it's
definitely better than constraining the type system for Options in a way that
we quite likely will regret. That being said, you've already added the
getValueOrDefault method. So you can still write this code:
.getFields()
.forEach(
f -> {
Integer pkSequence =
f.getOptions().getValueOrDefault("vptech.data.contract.v1.primary_key", null);
if (pkSequence != null) {
primaryKeyMap.put(pkSequence,
dataSchema.indexOf(f.getName()));
}
});
A tiny bit more verbose, but much more explicit about what you are trying to
do. (FYI the reason I don't suggest allowing getValue to return an Optional<T>
- even though it might be the "better" solution - is because the rest of the
code base does not use Optional, and I want things to stay consistent).
About the removeOption builder method. I'm struggling with this a bit, since
it feels like a weird pattern to put remove into a builder. I'm also not sure
whether copying a a Schema field with just a subset of options is a common use
case or not. Generally we should start off by keeping such things out of the
core Beam API until we are convinced that there are multiple users who need
them. I think you can still accomplish the same thing by simply adding a new
Options.Builder.setOptions override that takes in a Map.
newField = newField.withOptions(Options.builder().setOptions(
Maps.filterKeys(
oldField.getOptions().getAllOptions(),
n -> !n.equals(optionToRemove))));
This code is probably a bit more annoying for you to write. If we are
convinced that this is a common use case then I'm ok with adding removeOption,
but if it's specific to your use case then I think we're better off adding the
new setOptions method and using that.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 403545)
Time Spent: 7h 40m (was: 7.5h)
> BIP-1: Typed options for Row Schema and Fields
> ----------------------------------------------
>
> Key: BEAM-9035
> URL: https://issues.apache.org/jira/browse/BEAM-9035
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-java-core
> Reporter: Alex Van Boxel
> Assignee: Alex Van Boxel
> Priority: Major
> Fix For: 2.19.0
>
> Time Spent: 7h 40m
> Remaining Estimate: 0h
>
> This is the first issue of a multipart commit: this ticket implements the
> basic infrastructure of options on row and field.
> Full explanation:
> Introduce the concept of Options in Beam Schema’s to add extra context to
> fields and schema. In contracts to metadata, options would be added to
> fields, logical types and rows. In the options schema convertors can add
> options/annotations/decorators that were in the original schema, this context
> can be used in the rest of the pipeline for specific transformations or
> augment the end schema in the target output.
> Examples of options are:
> * informational: like the source of the data, ...
> * drive decisions further in the pipeline: flatten a row into another,
> rename a field, ...
> * influence something in the output: like cluster index, primary key, ...
> * logical type information
> And option is a key/typed value combination. The advantages of having the
> value types is:
> * Having strongly typed options would give a *portable way of Logical Types*
> to have structured information that could be shared over different languages.
> * This could keep the type intact when mapping from a formats that have
> strongly typed options (example: Protobuf).
> This is part of a multi ticket implementation. The following tickets are
> related:
> # Typed options for Row Schema and Fields
> # Convert Proto Options to Beam Schema options
> # Convert Avro extra information for Beam string options
> # Replace meta data with Logical Type options
> # Extract meta data in Calcite SQL to Beam options
> # Extract meta data in Zeta SQL to Beam options
> # Add java example of using option in a transform
> This feature is discussed with Reuven Lax, Brian Hulette
--
This message was sent by Atlassian Jira
(v8.3.4#803005)