[
https://issues.apache.org/jira/browse/AVRO-3479?focusedWorklogId=754887&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754887
]
ASF GitHub Bot logged work on AVRO-3479:
----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Apr/22 03:45
Start Date: 09/Apr/22 03:45
Worklog Time Spent: 10m
Work Description: jklamer commented on code in PR #1631:
URL: https://github.com/apache/avro/pull/1631#discussion_r846569639
##########
lang/rust/avro_derive/src/lib.rs:
##########
@@ -0,0 +1,366 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use proc_macro2::{Span, TokenStream, TokenTree};
+use quote::quote;
+
+use syn::{parse_macro_input, Attribute, DeriveInput, Error, Lit, Path, Type,
TypePath};
+
+#[proc_macro_derive(AvroSchema, attributes(namespace))]
+// Templated from Serde
+pub fn proc_macro_derive_avro_schema(input: proc_macro::TokenStream) ->
proc_macro::TokenStream {
+ let mut input = parse_macro_input!(input as DeriveInput);
+ derive_avro_schema(&mut input)
+ .unwrap_or_else(to_compile_errors)
+ .into()
+}
+
+fn derive_avro_schema(input: &mut DeriveInput) -> Result<TokenStream,
Vec<syn::Error>> {
+ let namespace = get_namespace_from_attributes(&input.attrs)?;
+ let full_schema_name = vec![namespace, Some(input.ident.to_string())]
+ .into_iter()
+ .flatten()
+ .collect::<Vec<String>>()
+ .join(".");
+ let schema_def = match &input.data {
+ syn::Data::Struct(s) => {
+ get_data_struct_schema_def(&full_schema_name, s,
input.ident.span())?
+ }
+ syn::Data::Enum(e) => get_data_enum_schema_def(&full_schema_name, e,
input.ident.span())?,
+ _ => {
+ return Err(vec![Error::new(
+ input.ident.span(),
+ "AvroSchema derive only works for structs and simple enums ",
+ )])
+ }
+ };
+
+ let ty = &input.ident;
+ let (impl_generics, ty_generics, where_clause) =
input.generics.split_for_impl();
+ Ok(quote! {
+ impl #impl_generics apache_avro::schema::AvroSchemaWithResolved for
#ty #ty_generics #where_clause {
+ fn get_schema_with_resolved(resolved_schemas: &mut
HashMap<apache_avro::schema::Name, apache_avro::schema::Schema>) ->
apache_avro::schema::Schema {
+ let name =
apache_avro::schema::Name::new(#full_schema_name).expect(&format!("Unable to
parse schema name {}", #full_schema_name)[..]);
+ if resolved_schemas.contains_key(&name) {
+ resolved_schemas.get(&name).unwrap().clone()
+ }else {
+ resolved_schemas.insert(name.clone(), Schema::Ref{name:
name.clone()});
+ #schema_def
+ }
+ }
+ }
+ })
+}
+
+fn get_namespace_from_attributes(attrs: &[Attribute]) ->
Result<Option<String>, Vec<Error>> {
+ let namespace_attr_path_constant: Path = syn::parse2::<Path>(quote!
{namespace}).unwrap();
+ const NAMESPACE_PARSING_ERROR_CONSTANST: &str =
+ "Namespace attribute must be in form #[namespace =
\"com.testing.namespace\"]";
+ // parse out namespace if present. Requires strict syntax
+ for attr in attrs {
+ if namespace_attr_path_constant == attr.path {
+ let mut input_tokens = attr.tokens.clone().into_iter();
+ if let (
+ Some(TokenTree::Punct(punct)),
+ Some(TokenTree::Literal(namespace_literal)),
+ None,
+ ) = (
+ input_tokens.next(),
+ input_tokens.next(),
+ input_tokens.next(),
+ ) {
+ if punct.as_char() == '=' {
+ if let Lit::Str(lit_str) = Lit::new(namespace_literal) {
+ return Ok(Some(lit_str.value()));
+ }
+ }
+ }
+ return Err(vec![Error::new_spanned(
+ &attr.tokens,
+ NAMESPACE_PARSING_ERROR_CONSTANST,
+ )]);
+ }
+ }
+ Ok(None)
+}
+
+fn get_data_struct_schema_def(
+ full_schema_name: &str,
+ s: &syn::DataStruct,
+ error_span: Span,
+) -> Result<TokenStream, Vec<Error>> {
+ let mut record_field_exprs = vec![];
+ match s.fields {
+ syn::Fields::Named(ref a) => {
+ for (position, field) in a.named.iter().enumerate() {
+ let name = field.ident.as_ref().unwrap().to_string(); // we
know everything has a name
+ let schema_expr = type_to_schema_expr(&field.ty)?;
+ let position = position;
+ record_field_exprs.push(quote! {
+ apache_avro::schema::RecordField {
+ name: #name.to_string(),
+ doc: Option::None,
+ default: Option::None,
+ schema: #schema_expr,
+ order:
apache_avro::schema::RecordFieldOrder::Ignore,
+ position: #position,
+ }
+ });
+ }
+ }
+ syn::Fields::Unnamed(_) => {
+ return Err(vec![Error::new(
+ error_span,
+ "AvroSchema derive does not work for tuple structs",
+ )])
+ }
+ syn::Fields::Unit => {
+ return Err(vec![Error::new(
+ error_span,
+ "AvroSchema derive does not work for unit structs",
+ )])
+ }
+ }
+ Ok(quote! {
+ let schema_fields = vec![#(#record_field_exprs),*];
+ let name =
apache_avro::schema::Name::new(#full_schema_name).expect(&format!("Unable to
struct name for schema {}", #full_schema_name)[..]);
+ apache_avro::schema::record_schema_for_fields(name, None, None,
schema_fields)
+ })
+}
+
+fn get_data_enum_schema_def(
+ full_schema_name: &str,
+ e: &syn::DataEnum,
+ error_span: Span,
+) -> Result<TokenStream, Vec<Error>> {
+ if e.variants.iter().all(|v| syn::Fields::Unit == v.fields) {
+ let symbols: Vec<String> = e
+ .variants
+ .iter()
+ .map(|varient| varient.ident.to_string())
+ .collect();
+ Ok(quote! {
+ apache_avro::schema::Schema::Enum {
+ name:
apache_avro::schema::Name::new(#full_schema_name).expect(&format!("Unable to
parse enum name for schema {}", #full_schema_name)[..]),
+ aliases: None,
+ doc: None,
+ symbols: vec![#(#symbols.to_owned()),*]
+ }
+ })
+ } else {
+ Err(vec![Error::new(
+ error_span,
+ "AvroSchema derive does not work for enums with non unit structs",
+ )])
+ }
+}
+
+/// Takes in the Tokens of a type and returns the tokens of an expression with
return type `Schema`
+fn type_to_schema_expr(ty: &Type) -> Result<TokenStream, Vec<Error>> {
+ if let Type::Path(p) = ty {
+ let type_string = p.path.segments.last().unwrap().ident.to_string();
+
+ let schema = match &type_string[..] {
+ "bool" => quote! {Schema::Boolean},
+ "i8" | "i16" | "i32" | "u8" | "u16" => quote!
{apache_avro::schema::Schema::Int},
+ "i64" => quote! {apache_avro::schema::Schema::Long},
Review Comment:
The current serde implementation for u32 that we have is
```
fn serialize_u32(self, v: u32) -> Result<Self::Ok, Self::Error> {
if v <= i32::MAX as u32 {
self.serialize_i32(v as i32)
} else {
self.serialize_i64(i64::from(v))
}
}
```
The schema would have to be value dependent so I couldn't always create a
schema that I could guarantee to work. Unless I always did `[integer, long]`
but that felt like unexpected behavior that just making the user revert to
manual definition and handle it. Lots of ways we could handle, what do you
think?
Issue Time Tracking
-------------------
Worklog Id: (was: 754887)
Time Spent: 40m (was: 0.5h)
> [rust] Derive Avro Schema macro
> -------------------------------
>
> Key: AVRO-3479
> URL: https://issues.apache.org/jira/browse/AVRO-3479
> Project: Apache Avro
> Issue Type: Improvement
> Reporter: Jack Klamer
> Assignee: Jack Klamer
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> The tracking Issue for the Avro Derive Feature of the rust SDK.
> Proposal (copied from email):
> Have another rust crate that is importable as a feature on the main crate (in
> the same manner as serde derive), that will provide a derive proc_macro that
> implements a simple trait that returns the schema for the implementing type.
> Right now, schemas must be parsed from strings ( or read from files first),
> and closely coordinated with the associated struct. This makes sense for
> workflows that need to associate the same type across languages. For programs
> that are all within Rust, there are usability advantages of the proc_macro.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)