Re: [DISCUSS] Portability representation of schemas

Brian Hulette Mon, 03 Jun 2019 10:24:14 -0700

Ah I see, I didn't realize that. Then I suppose we'll need to/from
functions somewhere in the logical type conversion to preserve the current
behavior.


I'm still a little hesitant to make these functions an explicit part of
LogicalTypeConversion for another reason. Down the road, schemas could give
us an avenue to use a batched columnar format (presumably arrow, but of
course others are possible). By making to/from an explicit part of logical
types we add some element-wise logic to a schema representation that's
otherwise ambivalent to element-wise vs. batched encodings.

I suppose you could make an argument that to/from are only for
custom types. There will also be some set of well-known types identified
only by URN and some parameters, which could easily be translated to a
columnar format. We could just not support custom types fully if we add a
columnar encoding, or maybe add optional toBatch/fromBatch functions
when/if we get there.

What about something like this that makes the two different types of
logical types explicit?

// Describes a logical type and how to convert between it and its
representation (e.g. Row).
message LogicalTypeConversion {
  oneof conversion {
    message Standard standard = 1;
    message Custom custom = 2;
  }

  message Standard {
    String urn = 1;
    repeated string args = 2; // could also be a map
  }

  message Custom {
    FunctionSpec(?) toRepresentation = 1;
    FunctionSpec(?) fromRepresentation = 2;
    bytes type = 3; // e.g. serialized class for Java
  }
}

And LogicalType and Schema become:

message LogicalType {
  FieldType representation = 1;
  LogicalTypeConversion conversion = 2;
}

message Schema {
  ...
  repeated Field fields = 1;
  LogicalTypeConversion conversion = 2; // implied that representation is
Row
}

Brian

On Sat, Jun 1, 2019 at 10:44 AM Reuven Lax <re...@google.com> wrote:

> Keep in mind that right now the SchemaRegistry is only assumed to exist at
> graph-construction time, not at execution time; all information in the
> schema registry is embedded in the SchemaCoder, which is the only thing we
> keep around when the pipeline is actually running. We could look into
> changing this, but it would potentially be a very big change, and I do
> think we should start getting users actively using schemas soon.
>
> On Fri, May 31, 2019 at 3:40 PM Brian Hulette <bhule...@google.com> wrote:
>
>> > Can you propose what the protos would look like in this case? Right now
>> LogicalType does not contain the to/from conversion functions in the proto.
>> Do you think we'll need to add these in?
>>
>> Maybe. Right now the proposed LogicalType message is pretty
>> simple/generic:
>> message LogicalType {
>>   FieldType representation = 1;
>>   string logical_urn = 2;
>>   bytes logical_payload = 3;
>> }
>>
>> If we keep just logical_urn and logical_payload, the logical_payload
>> could itself be a protobuf with attributes of 1) a serialized class and
>> 2/3) to/from functions. Or, alternatively, we could have a generalization
>> of the SchemaRegistry for logical types. Implementations for standard types
>> and user-defined types would be registered by URN, and the SDK could look
>> them up given just a URN. I put a brief section about this alternative in
>> the doc last week [1]. What I suggested there included removing the
>> logical_payload field, which is probably overkill. The critical piece is
>> just relying on a registry in the SDK to look up types and to/from
>> functions rather than storing them in the portable schema itself.
>>
>> I kind of like keeping the LogicalType message generic for now, since it
>> gives us a way to try out these various approaches, but maybe that's just a
>> cop out.
>>
>> [1]
>> https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit?ts=5cdf6a5b#heading=h.jlt5hdrolfy
>>
>> On Fri, May 31, 2019 at 12:36 PM Reuven Lax <re...@google.com> wrote:
>>
>>>
>>>
>>> On Tue, May 28, 2019 at 10:11 AM Brian Hulette <bhule...@google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Sun, May 26, 2019 at 1:25 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, May 24, 2019 at 11:42 AM Brian Hulette <bhule...@google.com>
>>>>> wrote:
>>>>>
>>>>>> *tl;dr:* SchemaCoder represents a logical type with a base type of
>>>>>> Row and we should think about that.
>>>>>>
>>>>>> I'm a little concerned that the current proposals for a portable
>>>>>> representation don't actually fully represent Schemas. It seems to me 
>>>>>> that
>>>>>> the current java-only Schemas are made up three concepts that are
>>>>>> intertwined:
>>>>>> (a) The Java SDK specific code for schema inference, type coercion,
>>>>>> and "schema-aware" transforms.
>>>>>> (b) A RowCoder[1] that encodes Rows[2] which have a particular
>>>>>> Schema[3].
>>>>>> (c) A SchemaCoder[4] that has a RowCoder for a particular schema, and
>>>>>> functions for converting Rows with that schema to/from a Java type T. 
>>>>>> Those
>>>>>> functions and the RowCoder are then composed to provider a Coder for the
>>>>>> type T.
>>>>>>
>>>>>
>>>>> RowCoder is currently just an internal implementation detail, it can
>>>>> be eliminated. SchemaCoder is the only thing that determines a schema 
>>>>> today.
>>>>>
>>>> Why not keep it around? I think it would make sense to have a RowCoder
>>>> implementation in every SDK, as well as something like SchemaCoder that
>>>> defines a conversion from that SDK's "Row" to the language type.
>>>>
>>>
>>> The point is that from a programmer's perspective, there is nothing much
>>> special about Row. Any type can have a schema, and the only special thing
>>> about Row is that it's always guaranteed to exist. From that standpoint,
>>> Row is nearly an implementation detail. Today RowCoder is never set on
>>> _any_ PCollection, it's literally just used as a helper library, so there's
>>> no real need for it to exist as a "Coder."
>>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> We're not concerned with (a) at this time since that's specific to
>>>>>> the SDK, not the interface between them. My understanding is we just want
>>>>>> to define a portable representation for (b) and/or (c).
>>>>>>
>>>>>> What has been discussed so far is really just a portable
>>>>>> representation for (b), the RowCoder, since the discussion is only around
>>>>>> how to represent the schema itself and not the to/from functions.
>>>>>>
>>>>>
>>>>> Correct. The to/from functions are actually related to a). One of the
>>>>> big goals of schemas was that users should not be forced to operate on 
>>>>> rows
>>>>> to get schemas. A user can create PCollection<MyRandomType> and as long as
>>>>> the SDK can infer a schema from MyRandomType, the user never needs to even
>>>>> see a Row object. The to/fromRow functions are what make this work today.
>>>>>
>>>>>
>>>>
>>>> One of the points I'd like to make is that this type coercion is a
>>>> useful concept on it's own, separate from schemas. It's especially useful
>>>> for a type that has a schema and is encoded by RowCoder since that can
>>>> represent many more types, but the type coercion doesn't have to be tied to
>>>> just schemas and RowCoder. We could also do type coercion for types that
>>>> are effectively wrappers around an integer or a string. It could just be a
>>>> general way to map language types to base types (i.e. types that we have a
>>>> coder for). Then it just becomes a general framework for extending coders
>>>> to represent more language types.
>>>>
>>>
>>> Let's not tie those conversations. Maybe a similar concept will hold
>>> true for general coders (or we might decide to get rid of coders in favor
>>> of schemas, in which case that becomes moot), but I don't think we should
>>> prematurely generalize.
>>>
>>>
>>>>
>>>>
>>>>
>>>>> One of the outstanding questions for that schema representation is how
>>>>>> to represent logical types, which may or may not have some language type 
>>>>>> in
>>>>>> each SDK (the canonical example being a timsetamp type with seconds and
>>>>>> nanos and java.time.Instant). I think this question is critically
>>>>>> important, because (c), the SchemaCoder, is actually *defining a logical
>>>>>> type* with a language type T in the Java SDK. This becomes clear when you
>>>>>> compare SchemaCoder[4] to the Schema.LogicalType interface[5] - both
>>>>>> essentially have three attributes: a base type, and two functions for
>>>>>> converting to/from that base type. The only difference is for SchemaCoder
>>>>>> that base type must be a Row so it can be represented by a Schema alone,
>>>>>> while LogicalType can have any base type that can be represented by
>>>>>> FieldType, including a Row.
>>>>>>
>>>>>
>>>>> This is not true actually. SchemaCoder can have any base type, that's
>>>>> why (in Java) it's SchemaCoder<T>. This is why PCollection<T> can have a
>>>>> schema, even if T is not Row.
>>>>>
>>>>>
>>>> I'm not sure I effectively communicated what I meant - When I said
>>>> SchemaCoder's "base type" I wasn't referring to T, I was referring to the
>>>> base FieldType, whose coder we use for this type. I meant "base type" to be
>>>> analogous to LogicalType's `getBaseType`, or what Kenn is suggesting we
>>>> call "representation" in the portable beam schemas doc. To define some
>>>> terms from my original message:
>>>> base type = an instance of FieldType, crucially this is something that
>>>> we have a coder for (be it VarIntCoder, Utf8Coder, RowCoder, ...)
>>>> language type (or "T", "type T", "logical type") = Some Java class (or
>>>> something analogous in the other SDKs) that we may or may not have a coder
>>>> for. It's possible to define functions for converting instances of the
>>>> language type to/from the base type.
>>>>
>>>> I was just trying to make the case that SchemaCoder is really a special
>>>> case of LogicalType, where `getBaseType` always returns a Row with the
>>>> stored Schema.
>>>>
>>>
>>> Yeah, I think  I got that point.
>>>
>>> Can you propose what the protos would look like in this case? Right now
>>> LogicalType does not contain the to/from conversion functions in the proto.
>>> Do you think we'll need to add these in?
>>>
>>>
>>>> To make the point with code: SchemaCoder<T> can be made to implement
>>>> Schema.LogicalType<T,Row> with trivial implementations of getBaseType,
>>>> toBaseType, and toInputType (I'm not trying to say we should or shouldn't
>>>> do this, just using it illustrate my point):
>>>>
>>>> class SchemaCoder extends CustomCoder<T> implements
>>>> Schema.LogicalType<T, Row> {
>>>>   ...
>>>>
>>>>   @Override
>>>>   FieldType getBaseType() {
>>>>     return FieldType.row(getSchema());
>>>>   }
>>>>
>>>>   @Override
>>>>   public Row toBaseType() {
>>>>     return this.toRowFunction.apply(input);
>>>>   }
>>>>
>>>>   @Override
>>>>   public T toInputType(Row base) {
>>>>     return this.fromRowFunction.apply(base);
>>>>   }
>>>>   ...
>>>> }
>>>>
>>>>
>>>>>> I think it may make sense to fully embrace this duality, by letting
>>>>>> SchemaCoder have a baseType other than just Row and renaming it to
>>>>>> LogicalTypeCoder/LanguageTypeCoder. The current Java SDK schema-aware
>>>>>> transforms (a) would operate only on LogicalTypeCoders with a Row base
>>>>>> type. Perhaps some of the current schema logic could  alsobe applied more
>>>>>> generally to any logical type  - for example, to provide type coercion 
>>>>>> for
>>>>>> logical types with a base type other than Row, like int64 and a timestamp
>>>>>> class backed by millis, or fixed size bytes and a UUID class. And having 
>>>>>> a
>>>>>> portable representation that represents those (non Row backed) logical
>>>>>> types with some URN would also allow us to pass them to other languages
>>>>>> without unnecessarily wrapping them in a Row in order to use SchemaCoder.
>>>>>>
>>>>>
>>>>> I think the actual overlap here is between the to/from functions in
>>>>> SchemaCoder (which is what allows SchemaCoder<T> where T != Row) and the
>>>>> equivalent functionality in LogicalType. However making all of schemas
>>>>> simply just a logical type feels a bit awkward and circular to me. Maybe 
>>>>> we
>>>>> should refactor that part out into a LogicalTypeConversion proto, and
>>>>> reference that from both LogicalType and from SchemaCoder?
>>>>>
>>>>
>>>> LogicalType is already potentially circular though. A schema can have a
>>>> field with a logical type, and that logical type can have a base type of
>>>> Row with a field with a logical type (and on and on...). To me it seems
>>>> elegant, not awkward, to recognize that SchemaCoder is just a special case
>>>> of this concept.
>>>>
>>>> Something like the LogicalTypeConversion proto would definitely be an
>>>> improvement, but I would still prefer just using a top-level logical type 
>>>> :)
>>>>
>>>>>
>>>>>
>>>>> I've added a section to the doc [6] to propose this alternative in the
>>>>>> context of the portable representation but I wanted to bring it up here 
>>>>>> as
>>>>>> well to solicit feedback.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java#L41
>>>>>> [2]
>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java#L59
>>>>>> [3]
>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L48
>>>>>> [4]
>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java#L33
>>>>>> [5]
>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java#L489
>>>>>> [6]
>>>>>> https://docs.google.com/document/d/1uu9pJktzT_O3DxGd1-Q2op4nRk4HekIZbzi-0oTAips/edit?ts=5cdf6a5b#heading=h.7570feur1qin
>>>>>>
>>>>>> On Fri, May 10, 2019 at 9:16 AM Brian Hulette <bhule...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Ah thanks! I added some language there.
>>>>>>>
>>>>>>> *From: *Kenneth Knowles <k...@apache.org>
>>>>>>> *Date: *Thu, May 9, 2019 at 5:31 PM
>>>>>>> *To: *dev
>>>>>>>
>>>>>>>
>>>>>>>> *From: *Brian Hulette <bhule...@google.com>
>>>>>>>> *Date: *Thu, May 9, 2019 at 2:02 PM
>>>>>>>> *To: * <dev@beam.apache.org>
>>>>>>>>
>>>>>>>> We briefly discussed using arrow schemas in place of beam schemas
>>>>>>>>> entirely in an arrow thread [1]. The biggest reason not to this was 
>>>>>>>>> that we
>>>>>>>>> wanted to have a type for large iterables in beam schemas. But given 
>>>>>>>>> that
>>>>>>>>> large iterables aren't currently implemented, beam schemas look very
>>>>>>>>> similar to arrow schemas.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> I think it makes sense to take inspiration from arrow schemas
>>>>>>>>> where possible, and maybe even copy them outright. Arrow already has a
>>>>>>>>> portable (flatbuffers) schema representation [2], and implementations 
>>>>>>>>> for
>>>>>>>>> it in many languages that we may be able to re-use as we bring 
>>>>>>>>> schemas to
>>>>>>>>> more SDKs (the project has Python and Go implementations). There are a
>>>>>>>>> couple of concepts in Arrow schemas that are specific for the format 
>>>>>>>>> and
>>>>>>>>> wouldn't make sense for us, (fields can indicate whether or not they 
>>>>>>>>> are
>>>>>>>>> dictionary encoded, and the schema has an endianness field), but if 
>>>>>>>>> you
>>>>>>>>> drop those concepts the arrow spec looks pretty similar to the beam 
>>>>>>>>> proto
>>>>>>>>> spec.
>>>>>>>>>
>>>>>>>>
>>>>>>>> FWIW I left a blank section in the doc for filling out what the
>>>>>>>> differences are and why, and conversely what the interop opportunities 
>>>>>>>> may
>>>>>>>> be. Such sections are some of my favorite sections of design docs.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>>
>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://lists.apache.org/thread.html/6be7715e13b71c2d161e4378c5ca1c76ac40cfc5988a03ba87f1c434@%3Cdev.beam.apache.org%3E
>>>>>>>>> [2]
>>>>>>>>> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L194
>>>>>>>>>
>>>>>>>>> *From: *Robert Bradshaw <rober...@google.com>
>>>>>>>>> *Date: *Thu, May 9, 2019 at 1:38 PM
>>>>>>>>> *To: *dev
>>>>>>>>>
>>>>>>>>> From: Reuven Lax <re...@google.com>
>>>>>>>>>> Date: Thu, May 9, 2019 at 7:29 PM
>>>>>>>>>> To: dev
>>>>>>>>>>
>>>>>>>>>> > Also in the future we might be able to do optimizations at the
>>>>>>>>>> runner level if at the portability layer we understood schemes 
>>>>>>>>>> instead of
>>>>>>>>>> just raw coders. This could be things like only parsing a subset of 
>>>>>>>>>> a row
>>>>>>>>>> (if we know only a few fields are accessed) or using a columnar data
>>>>>>>>>> structure like Arrow to encode batches of rows across portability. 
>>>>>>>>>> This
>>>>>>>>>> doesn't affect data semantics of course, but having a richer,
>>>>>>>>>> more-expressive type system opens up other opportunities.
>>>>>>>>>>
>>>>>>>>>> But we could do all of that with a RowCoder we understood to
>>>>>>>>>> designate
>>>>>>>>>> the type(s), right?
>>>>>>>>>>
>>>>>>>>>> > On Thu, May 9, 2019 at 10:16 AM Robert Bradshaw <
>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> On the flip side, Schemas are equivalent to the space of
>>>>>>>>>> Coders with
>>>>>>>>>> >> the addition of a RowCoder and the ability to materialize to
>>>>>>>>>> something
>>>>>>>>>> >> other than bytes, right? (Perhaps I'm missing something big
>>>>>>>>>> here...)
>>>>>>>>>> >> This may make a backwards-compatible transition easier.
>>>>>>>>>> (SDK-side, the
>>>>>>>>>> >> ability to reason about and operate on such types is of course
>>>>>>>>>> much
>>>>>>>>>> >> richer than anything Coders offer right now.)
>>>>>>>>>> >>
>>>>>>>>>> >> From: Reuven Lax <re...@google.com>
>>>>>>>>>> >> Date: Thu, May 9, 2019 at 4:52 PM
>>>>>>>>>> >> To: dev
>>>>>>>>>> >>
>>>>>>>>>> >> > FYI I can imagine a world in which we have no coders. We
>>>>>>>>>> could define the entire model on top of schemas. Today's "Coder" is
>>>>>>>>>> completely equivalent to a single-field schema with a logical-type 
>>>>>>>>>> field
>>>>>>>>>> (actually the latter is slightly more expressive as you aren't 
>>>>>>>>>> forced to
>>>>>>>>>> serialize into bytes).
>>>>>>>>>> >> >
>>>>>>>>>> >> > Due to compatibility constraints and the effort that would
>>>>>>>>>> be  involved in such a change, I think the practical decision should 
>>>>>>>>>> be for
>>>>>>>>>> schemas and coders to coexist for the time being. However when we 
>>>>>>>>>> start
>>>>>>>>>> planning Beam 3.0, deprecating coders is something I would like to 
>>>>>>>>>> suggest.
>>>>>>>>>> >> >
>>>>>>>>>> >> > On Thu, May 9, 2019 at 7:48 AM Robert Bradshaw <
>>>>>>>>>> rober...@google.com> wrote:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> From: Kenneth Knowles <k...@apache.org>
>>>>>>>>>> >> >> Date: Thu, May 9, 2019 at 10:05 AM
>>>>>>>>>> >> >> To: dev
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> > This is a huge development. Top posting because I can be
>>>>>>>>>> more compact.
>>>>>>>>>> >> >> >
>>>>>>>>>> >> >> > I really think after the initial idea converges this
>>>>>>>>>> needs a design doc with goals and alternatives. It is an 
>>>>>>>>>> extraordinarily
>>>>>>>>>> consequential model change. So in the spirit of doing the work / bias
>>>>>>>>>> towards action, I created a quick draft at
>>>>>>>>>> https://s.apache.org/beam-schemas and added everyone on this
>>>>>>>>>> thread as editors. I am still in the process of writing this to 
>>>>>>>>>> match the
>>>>>>>>>> thread.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Thanks! Added some comments there.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> > *Multiple timestamp resolutions*: you can use logcial
>>>>>>>>>> types to represent nanos the same way Java and proto do.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> As per the other discussion, I'm unsure the value in
>>>>>>>>>> supporting
>>>>>>>>>> >> >> multiple timestamp resolutions is high enough to outweigh
>>>>>>>>>> the cost.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> > *Why multiple int types?* The domain of values for these
>>>>>>>>>> types are different. For a language with one "int" or "number" type, 
>>>>>>>>>> that's
>>>>>>>>>> another domain of values.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> What is the value in having different domains? If your data
>>>>>>>>>> has a
>>>>>>>>>> >> >> natural domain, chances are it doesn't line up exactly with
>>>>>>>>>> one of
>>>>>>>>>> >> >> these. I guess it's for languages whose types have specific
>>>>>>>>>> domains?
>>>>>>>>>> >> >> (There's also compactness in representation, encoded and
>>>>>>>>>> in-memory,
>>>>>>>>>> >> >> though I'm not sure that's high.)
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> > *Columnar/Arrow*: making sure we unlock the ability to
>>>>>>>>>> take this path is Paramount. So tying it directly to a row-oriented 
>>>>>>>>>> coder
>>>>>>>>>> seems counterproductive.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> I don't think Coders are necessarily row-oriented. They
>>>>>>>>>> are, however,
>>>>>>>>>> >> >> bytes-oriented. (Perhaps they need not be.) There seems to
>>>>>>>>>> be a lot of
>>>>>>>>>> >> >> overlap between what Coders express in terms of element
>>>>>>>>>> typing
>>>>>>>>>> >> >> information and what Schemas express, and I'd rather have
>>>>>>>>>> one concept
>>>>>>>>>> >> >> if possible. Or have a clear division of responsibilities.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> > *Multimap*: what does it add over an array-valued map or
>>>>>>>>>> large-iterable-valued map? (honest question, not rhetorical)
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Multimap has a different notion of what it means to contain
>>>>>>>>>> a value,
>>>>>>>>>> >> >> can handle (unordered) unions of non-disjoint keys, etc.
>>>>>>>>>> Maybe this
>>>>>>>>>> >> >> isn't worth a new primitive type.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> > *URN/enum for type names*: I see the case for both. The
>>>>>>>>>> core types are fundamental enough they should never really change - 
>>>>>>>>>> after
>>>>>>>>>> all, proto, thrift, avro, arrow, have addressed this (not to mention 
>>>>>>>>>> most
>>>>>>>>>> programming languages). Maybe additions once every few years. I 
>>>>>>>>>> prefer the
>>>>>>>>>> smallest intersection of these schema languages. A oneof is more 
>>>>>>>>>> clear,
>>>>>>>>>> while URN emphasizes the similarity of built-in and logical types.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Hmm... Do we have any examples of the multi-level
>>>>>>>>>> primitive/logical
>>>>>>>>>> >> >> type in any of these other systems? I have a bias towards
>>>>>>>>>> all types
>>>>>>>>>> >> >> being on the same footing unless there is compelling reason
>>>>>>>>>> to divide
>>>>>>>>>> >> >> things into primitive/use-defined ones.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Here it seems like the most essential value of the
>>>>>>>>>> primitive type set
>>>>>>>>>> >> >> is to describe the underlying representation, for encoding
>>>>>>>>>> elements in
>>>>>>>>>> >> >> a variety of ways (notably columnar, but also interfacing
>>>>>>>>>> with other
>>>>>>>>>> >> >> external systems like IOs). Perhaps, rather than the
>>>>>>>>>> previous
>>>>>>>>>> >> >> suggestion of making everything a logical of bytes, this
>>>>>>>>>> could be made
>>>>>>>>>> >> >> clear by still making everything a logical type, but
>>>>>>>>>> renaming
>>>>>>>>>> >> >> "TypeName" to Representation. There would be URNs
>>>>>>>>>> (typically with
>>>>>>>>>> >> >> empty payloads) for the various primitive types (whose
>>>>>>>>>> mapping to
>>>>>>>>>> >> >> their representations would be the identity).
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> - Robert
>>>>>>>>>>
>>>>>>>>>

Re: [DISCUSS] Portability representation of schemas

Reply via email to