[
https://issues.apache.org/jira/browse/PARQUET-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546168#comment-14546168
]
Tianshuo Deng commented on PARQUET-278:
---------------------------------------
It's good to double check empty fields in the constructor of GroupType.
Even if we make the constructor non public, the fluent builder api still calls
it.
Migrating to using the builder API could be in another separate PR.
Also not sure if
Types.buildMessage()
.required(INT64).named("DocId")
.optionalGroup()
.repeated(INT64).named("Backward")
.repeated(INT64).named("Forward")
.named("Links")
.repeatedGroup()
.repeatedGroup()
.required(BINARY).named("Code")
.optional(BINARY).named("Country")
.named("Language")
.optional(BINARY).named("Url")
.named("Name")
.named("Document");
really looks better than the constructor API in terms of readability since
parquet schema is a nested structure, but the fluentAPI needs manually
indentation to reflect the nested structure. The constructor based API is
simple and clear, does its job nicely which is constructing objects and IDE can
indent it pretty well.
new MessageType("Document",
new PrimitiveType(REQUIRED, INT64, "DocId"),
new GroupType(OPTIONAL, "Links",
new PrimitiveType(REPEATED, INT64, "Backward"),
new PrimitiveType(REPEATED, INT64, "Forward")
),
new GroupType(REPEATED, "Name",
new GroupType(REPEATED, "Language",
new PrimitiveType(REQUIRED, BINARY, "Code"),
new PrimitiveType(OPTIONAL, BINARY, "Country")),
new PrimitiveType(OPTIONAL, BINARY, "Url")));
> enforce non empty group on MessageType level
> --------------------------------------------
>
> Key: PARQUET-278
> URL: https://issues.apache.org/jira/browse/PARQUET-278
> Project: Parquet
> Issue Type: Improvement
> Reporter: Tianshuo Deng
>
> As columnar format, parquet currently does not support empty struct/group
> without leaves. We should throw when constructing an empty GroupType to give
> a clear message.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)