Hi Brad, So, the UUD logical type is added to parquet-format which is the specification of Parquet. It is not yet implemented in parquet-mr so you are not able to use it. However, parquet-mr does not provide too much support for logical types anyway, so you might simply use the FIXED_LEN_BYTE_ARRAY(16) primitive type without the logical type. Feel free to file a jira for parquet-mr to implement it.
Cheers, Gabor On Wed, Mar 25, 2020 at 3:32 PM Brad Smith <bnsm...@gmail.com> wrote: > I recently read about the new UUID logical type introduced in > parquet-format 2.4.0. I'm interested in trying it out, but I haven't been > able to figure out how to make it work so far. > > For example, the code below uses the parquet-mr library to output a very > simple test Parquet file with one string field and one int field: > > public class ParquetCreator { > public static void main(String[] args) { > String schema = "message spark_schema {\n optional binary s > (STRING);\n optional INT32 i;\n}"; > MessageType readSchema = > MessageTypeParser.parseMessageType(schema); > SimpleGroupFactory sfg = new SimpleGroupFactory(readSchema); > Configuration conf = new Configuration(); > GroupWriteSupport.setSchema(readSchema, conf); > Path p = new Path("file:///tmp/testfile.parquet"); > > try { > ParquetWriter<Group> writer = ExampleParquetWriter.builder(p) > > .withWriterVersion(ParquetProperties.WriterVersion.PARQUET_1_0) > .withCompressionCodec(CompressionCodecName.SNAPPY) > .withRowGroupSize(1024*1024) > .withPageSize(1024) > .enableDictionaryEncoding() > .withDictionaryPageSize(2*1024) > .withConf(conf) > .build(); > writer.write(sfg.newGroup().append("s", "abc").append("i", > 123)); > writer.write(sfg.newGroup().append("s", "def").append("i", > 456)); > writer.close(); > } catch (Exception e) { > e.printStackTrace(); > } > } > } > > However, it doesn't work when I try to alter the schema to add a UUID > field, like this: > > String schema = "message spark_schema {\n optional binary s (STRING);\n > optional INT32 i; optional FIXED_LEN_BYTE_ARRAY(16) u (UUID);\n}"; > > I just get a "No enum constant org.apache.parquet.schema.OriginalType.UUID" > error. I've tried several variations on this schema so far, but no > successes so far. Is there something that I'm doing incorrectly with the > schema? Or is the UUID logical type not supported in parquet-mr yet? > > Any suggestions would be much appreciated! > > Thanks, > Brad >