I recently read about the new UUID logical type introduced in
parquet-format 2.4.0. I'm interested in trying it out, but I haven't been
able to figure out how to make it work so far.

For example, the code below uses the parquet-mr library to output a very
simple test Parquet file with one string field and one int field:

public class ParquetCreator {
    public static void main(String[] args) {
        String schema = "message spark_schema {\n  optional binary s
(STRING);\n  optional INT32 i;\n}";
        MessageType readSchema = MessageTypeParser.parseMessageType(schema);
        SimpleGroupFactory sfg = new SimpleGroupFactory(readSchema);
        Configuration conf = new Configuration();
        GroupWriteSupport.setSchema(readSchema, conf);
        Path p = new Path("file:///tmp/testfile.parquet");

        try {
            ParquetWriter<Group> writer = ExampleParquetWriter.builder(p)

            writer.write(sfg.newGroup().append("s", "abc").append("i",
            writer.write(sfg.newGroup().append("s", "def").append("i",
        } catch (Exception e) {

However, it doesn't work when I try to alter the schema to add a UUID
field, like this:

String schema = "message spark_schema {\n  optional binary s (STRING);\n
 optional INT32 i; optional FIXED_LEN_BYTE_ARRAY(16) u (UUID);\n}";

I just get a "No enum constant org.apache.parquet.schema.OriginalType.UUID"
error. I've tried several variations on this schema so far, but no
successes so far. Is there something that I'm doing incorrectly with the
schema? Or is the UUID logical type not supported in parquet-mr yet?

Any suggestions would be much appreciated!


Reply via email to