Renkai commented on issue #2444:
URL: https://github.com/apache/arrow-rs/issues/2444#issuecomment-1216760591
@tustvold Thanks a lot!
I replaced the generator with this one, it basically changed the
`pa.PyExtensionType` to `pa.ExtensionType`. The rust parquet parser works well
now except it would read the data type as `FixedSizeBinary(16)`. I think it's a
slight difference in behavior from the C++ parser. For practice, I can continue
my adventure, but would you consider making the community less divergent?
```
import pyarrow as pa
class UuidType(pa.ExtensionType):
def __init__(self):
pa.ExtensionType.__init__(self, pa.binary(16),"lance.uuid")
def __arrow_ext_serialize__(self):
# since we don't have a parameterized type, we don't need extra
# metadata to be deserialized
return b''
@classmethod
def __arrow_ext_deserialize__(self, storage_type, serialized):
# return an instance of this subclass given the serialized
# metadata.
return UuidType()
if __name__ == '__main__':
uuid_type = UuidType()
print(uuid_type.extension_name)
print(uuid_type.storage_type)
import uuid
storage_array = pa.array([uuid.uuid4().bytes for _ in range(4)],
pa.binary(16))
arr = pa.ExtensionArray.from_storage(uuid_type, storage_array)
print(arr)
table = pa.Table.from_arrays([arr], names=["uuid"])
import pyarrow.parquet as pq
pq.write_table(table, "extension_example.parquet")
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]