Datafarme save as table operation is failing when the child columns name contains special characters

abhijeet bedagkar Wed, 16 May 2018 05:43:56 -0700

Hi,

I am using SPARK to read the XML / JSON files to create a dataframe and
save it as a hive table


Sample XML file:
<revolt_configuration>
<id>101</id>
    <testexecutioncontroller>
        <execution-timeout>45</execution-timeout>
        <execution->COMMAND</execution-method>
    </testexecutioncontroller>
</revolt_configuration>

Note field 'validation-timeout' under testexecutioncontroller.

Below is the schema populated by DF after reading the XML file

|-- id: long (nullable = true)
|-- testexecutioncontroller: struct (nullable = true)
|    |-- execution-timeout: long (nullable = true)
|    |-- execution-method: string (nullable = true)

While saving this dataframe to hive table I am getting below exception

Caused by: java.lang.IllegalArgumentException: Error: : expected at the
position 24 of
'bigint:struct<execution-timeout:bigint,execution-method:string>' but '-'
is found.        at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:360)
      at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331)
      at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:483)
      at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305)
      at
org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765)
      at
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:111)
      at
org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53)
      at
org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
      at
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)
      at
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:276)
      at
org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:197)
  at org.apache

It looks like the issue is happening due to special character '-' in the
field. As after removing the special character it iw working properly.

Could you please let me know if there is way to replaces all child column
names so that it can be saved as table without any issue.

Creating the STRUCT FIELD from df.schema and recursively creating another
STRUCTFIELD with renamed column is one solution I am aware of. But still
wanted to check if there is easy way to do this.

Thanks,
Abhijeet

Datafarme save as table operation is failing when the child columns name contains special characters

Reply via email to