[ 
https://issues.apache.org/jira/browse/HUDI-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Darvas updated HUDI-4047:
--------------------------------
    Description: 
Hi Guys!

 

I have just used the schema validation and works as a charm, but :)

 

A few things would be very usefull

1.) after the error message: Failed schema compatibility check for \{FULL JSON 
Compatible payload should come}

2.) in the JSON payload should contain "violations": [\\{item}, \\{item}] 

  if go over all the violations is complex, or not performant then just print 
the first oine

   "violation": \{"field_name": field_name,  "writerSchema": "writer_schema, 
"tableSchema": table_schema }

 

Why? ;) - if someone has a table with a lot of cols/feature would be easier to 
find the discrepenacy.

So the debug process would be copy the FULL JSON into a json editor, and check 
the nodes... 

this would speed up the the debug for me ;) but maybe I am not alone with this.

 

Thanks,

 

I got an exception like this for example and I would like a nicer one like I 
explained above:

Caused by: org.apache.hudi.exception.HoodieException: Failed schema 
compatibility check for writerSchema 
:{"type":"record","name":"iot_raw_{_}ingress_pkg_decoded_rep_record","namespace":"hoodie.iot_raw{_}_ingress_pkg_decoded_rep","fields":[

{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null}

,{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"correlation_id","type":["null","string"],"default":null},{"name":"iot_pkg_receive_time","type":["null",

{"type":"long","logicalType":"timestamp-micros"}

],"default":null},{"name":"parsing_time","type":["null",

{"type":"long","logicalType":"timestamp-micros"}

],"default":null},{"name":"receive_time","type":["null",

{"type":"long","logicalType":"timestamp-micros"}

],"default":null},{"name":"aggregate_id","type":["null","string"],"default":null},{"name":"message_id","type":["null","int"],"default":null},{"name":"message_type_name","type":["null","string"],"default":null},{"name":"message_type_id","type":["null","int"],"default":null},{"name":"report_id","type":["null","int"],"default":null},{"name":"report_type_name","type":["null","string"],"default":null},{"name":"report_type_id","type":["null","int"],"default":null},{"name":"report_dcd_payload","type":["null","string"],"default":null}]},
 table schema 
:{"type":"record","name":"hoodie_source","namespace":"hoodie.source","fields":[

{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null}

,{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},\{"name":"correlation_id","type":"string"},\{"name":"iot_pkg_receive_time","type":{"type":"long","logicalType":"timestamp-micros"}},\{"name":"parsing_time","type":{"type":"long","logicalType":"timestamp-micros"}},\{"name":"receive_time","type":{"type":"long","logicalType":"timestamp-micros"}},\{"name":"aggregate_id","type":"string"},\{"name":"message_id","type":"int"},\{"name":"message_type_name","type":"string"},\{"name":"message_type_id","type":"int"},\{"name":"report_id","type":"int"},\{"name":"report_type_name","type":"string"},\{"name":"report_type_id","type":"int"},\{"name":"report_dcd_payload","type":"string"},{"name":"year","type":["null","int"],"default":null},{"name":"month","type":["null","int"],"default":null},{"name":"day","type":["null","int"],"default":null}]},
 base path :s3://scgps-datalake/iot_raw/ingress_pkg_decoded_rep
        at 
org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:682)
        at 
org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:688)
        ... 42 more

 

  was:
Hi Guys!

 

I have just used the schema validation and works as a charm, but :)

 

A few things would be very usefull

1.) after the error message: Failed schema compatibility check for \{FULL JSON 
Compatible payload should come}

2.) in the JSON payload should contain "violations": [\{item}, \{item}] 

  if go over all the violations is complex, or not performant then just print 
the first oine

   "violation": \{"fiel_name": fied_name,  "writerSchema": "writer_schema, 
"tableSchema": table_schema }

 

Why? ;) - if someone has a table with a lot of cols/feature would be easier to 
find the discrepenacy.

So the debug process would be copy the FULL JSON into a json editor, and check 
the nodes... 

this would speed up the the debug for me ;) but maybe I am not alone with this.

 

Thanks,

 

I got an exception like this for example and I would like a nicer one like I 
explained above:

Caused by: org.apache.hudi.exception.HoodieException: Failed schema 
compatibility check for writerSchema 
:\{"type":"record","name":"iot_raw__ingress_pkg_decoded_rep_record","namespace":"hoodie.iot_raw__ingress_pkg_decoded_rep","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},\{"name":"correlation_id","type":["null","string"],"default":null},\{"name":"iot_pkg_receive_time","type":["null",{"type":"long","logicalType":"timestamp-micros"}],"default":null},\{"name":"parsing_time","type":["null",{"type":"long","logicalType":"timestamp-micros"}],"default":null},\{"name":"receive_time","type":["null",{"type":"long","logicalType":"timestamp-micros"}],"default":null},\{"name":"aggregate_id","type":["null","string"],"default":null},\{"name":"message_id","type":["null","int"],"default":null},\{"name":"message_type_name","type":["null","string"],"default":null},\{"name":"message_type_id","type":["null","int"],"default":null},\{"name":"report_id","type":["null","int"],"default":null},\{"name":"report_type_name","type":["null","string"],"default":null},\{"name":"report_type_id","type":["null","int"],"default":null},\{"name":"report_dcd_payload","type":["null","string"],"default":null}]},
 table schema 
:\{"type":"record","name":"hoodie_source","namespace":"hoodie.source","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},\{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},\{"name":"correlation_id","type":"string"},\{"name":"iot_pkg_receive_time","type":{"type":"long","logicalType":"timestamp-micros"}},\{"name":"parsing_time","type":{"type":"long","logicalType":"timestamp-micros"}},\{"name":"receive_time","type":{"type":"long","logicalType":"timestamp-micros"}},\{"name":"aggregate_id","type":"string"},\{"name":"message_id","type":"int"},\{"name":"message_type_name","type":"string"},\{"name":"message_type_id","type":"int"},\{"name":"report_id","type":"int"},\{"name":"report_type_name","type":"string"},\{"name":"report_type_id","type":"int"},\{"name":"report_dcd_payload","type":"string"},\{"name":"year","type":["null","int"],"default":null},\{"name":"month","type":["null","int"],"default":null},\{"name":"day","type":["null","int"],"default":null}]},
 base path :s3://scgps-datalake/iot_raw/ingress_pkg_decoded_rep
        at 
org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:682)
        at 
org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:688)
        ... 42 more

 


> hoodie.avro.schema.validate error message refact
> ------------------------------------------------
>
>                 Key: HUDI-4047
>                 URL: https://issues.apache.org/jira/browse/HUDI-4047
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: Istvan Darvas
>            Priority: Minor
>
> Hi Guys!
>  
> I have just used the schema validation and works as a charm, but :)
>  
> A few things would be very usefull
> 1.) after the error message: Failed schema compatibility check for \{FULL 
> JSON Compatible payload should come}
> 2.) in the JSON payload should contain "violations": [\\{item}, \\{item}] 
>   if go over all the violations is complex, or not performant then just print 
> the first oine
>    "violation": \{"field_name": field_name,  "writerSchema": "writer_schema, 
> "tableSchema": table_schema }
>  
> Why? ;) - if someone has a table with a lot of cols/feature would be easier 
> to find the discrepenacy.
> So the debug process would be copy the FULL JSON into a json editor, and 
> check the nodes... 
> this would speed up the the debug for me ;) but maybe I am not alone with 
> this.
>  
> Thanks,
>  
> I got an exception like this for example and I would like a nicer one like I 
> explained above:
> Caused by: org.apache.hudi.exception.HoodieException: Failed schema 
> compatibility check for writerSchema 
> :{"type":"record","name":"iot_raw_{_}ingress_pkg_decoded_rep_record","namespace":"hoodie.iot_raw{_}_ingress_pkg_decoded_rep","fields":[
> {"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null}
> ,{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"correlation_id","type":["null","string"],"default":null},{"name":"iot_pkg_receive_time","type":["null",
> {"type":"long","logicalType":"timestamp-micros"}
> ],"default":null},{"name":"parsing_time","type":["null",
> {"type":"long","logicalType":"timestamp-micros"}
> ],"default":null},{"name":"receive_time","type":["null",
> {"type":"long","logicalType":"timestamp-micros"}
> ],"default":null},{"name":"aggregate_id","type":["null","string"],"default":null},{"name":"message_id","type":["null","int"],"default":null},{"name":"message_type_name","type":["null","string"],"default":null},{"name":"message_type_id","type":["null","int"],"default":null},{"name":"report_id","type":["null","int"],"default":null},{"name":"report_type_name","type":["null","string"],"default":null},{"name":"report_type_id","type":["null","int"],"default":null},{"name":"report_dcd_payload","type":["null","string"],"default":null}]},
>  table schema 
> :{"type":"record","name":"hoodie_source","namespace":"hoodie.source","fields":[
> {"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null}
> ,{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},\{"name":"correlation_id","type":"string"},\{"name":"iot_pkg_receive_time","type":{"type":"long","logicalType":"timestamp-micros"}},\{"name":"parsing_time","type":{"type":"long","logicalType":"timestamp-micros"}},\{"name":"receive_time","type":{"type":"long","logicalType":"timestamp-micros"}},\{"name":"aggregate_id","type":"string"},\{"name":"message_id","type":"int"},\{"name":"message_type_name","type":"string"},\{"name":"message_type_id","type":"int"},\{"name":"report_id","type":"int"},\{"name":"report_type_name","type":"string"},\{"name":"report_type_id","type":"int"},\{"name":"report_dcd_payload","type":"string"},{"name":"year","type":["null","int"],"default":null},{"name":"month","type":["null","int"],"default":null},{"name":"day","type":["null","int"],"default":null}]},
>  base path :s3://scgps-datalake/iot_raw/ingress_pkg_decoded_rep
>         at 
> org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:682)
>         at 
> org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:688)
>         ... 42 more
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to