chinmay-032 commented on issue #8625:
URL: https://github.com/apache/hudi/issues/8625#issuecomment-1545603221
#### New insights after discussing witn @ad1happy2go:
The problem arises when using a dynamically created StructType schema. When
a statically declared schema is used, the updates work fine. That is, something
like:
```
json_schema = StructType([StructField('profile_id', StringType(), True),
StructField('timestamp', TimestampType(), True), StructField('id',
StringType(), True), StructField('Enjoy', BooleanType(), True),
StructField('DOB', ArrayType(StringType(), False), True), StructField('zip',
StringType(), True), StructField('country', StringType(), True),
StructField('email_vendor', StringType(), True), StructField('city',
StringType(), True), StructField('active_audience', DoubleType(), True),
StructField('last_name', StringType(), True), StructField('migrated_from',
StringType(), True), StructField('product_range', ArrayType(StringType(),
False), True), StructField('email_sub', BooleanType(), True),
StructField('sms_vendor', StringType(), True), StructField('audience_count',
DoubleType(), True), StructField('whatsapp_vendor', StringType(), True),
StructField('first_name', StringType(), True)])
.
.
<rest of program>
```
works as expected. However when using a schema dynamically fetched using an
internal service:
```
def get_structfield(fieldname, typestring):
if typestring == "NUMBER":
return StructField(fieldname, DoubleType(), True)
elif typestring == "BOOLEAN":
return StructField(fieldname, BooleanType(), True)
elif typestring == "RELATIVE_TIME":
return StructField(fieldname, TimestampType(), True)
elif typestring == "LIST_DOUBLE":
return StructField(fieldname, ArrayType(DoubleType(), False), True)
elif typestring == "LIST_STRING":
return StructField(fieldname, ArrayType(StringType(), False), True)
return StructField(fieldname, StringType(), True)
def get_schema(shop_id):
url = "my-url"
params = {
## my-params
}
response = requests.get(url, params=params)
if response.status_code == 200:
response_json = response.json()
schemaDict = response_json["data"]
LOGGER.info(f"Table profile_{shop_id}: Fetch Schema Successful")
schema = StructType()
for key in schemaDict:
schema.add(get_structfield(key, schemaDict[key]))
return schema
else:
LOGGER.info(f"Table profile_{shop_id}: Fetch Schema Failed")
raise Exception("Could not get schema")
```
does not update properly.
As our use case heavily depends upon getting a schema dynamically, we cannot
adopt the static approach and are trying to get a solution for that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]