Hi Brad,

So, the UUD logical type is added to parquet-format which is the
specification of Parquet. It is not yet implemented in parquet-mr so you
are not able to use it.
However, parquet-mr does not provide too much support for logical types
anyway, so you might simply use the FIXED_LEN_BYTE_ARRAY(16) primitive type
without the logical type.
Feel free to file a jira for parquet-mr to implement it.

Cheers,
Gabor

On Wed, Mar 25, 2020 at 3:32 PM Brad Smith <bnsm...@gmail.com> wrote:

> I recently read about the new UUID logical type introduced in
> parquet-format 2.4.0. I'm interested in trying it out, but I haven't been
> able to figure out how to make it work so far.
>
> For example, the code below uses the parquet-mr library to output a very
> simple test Parquet file with one string field and one int field:
>
> public class ParquetCreator {
>     public static void main(String[] args) {
>         String schema = "message spark_schema {\n  optional binary s
> (STRING);\n  optional INT32 i;\n}";
>         MessageType readSchema =
> MessageTypeParser.parseMessageType(schema);
>         SimpleGroupFactory sfg = new SimpleGroupFactory(readSchema);
>         Configuration conf = new Configuration();
>         GroupWriteSupport.setSchema(readSchema, conf);
>         Path p = new Path("file:///tmp/testfile.parquet");
>
>         try {
>             ParquetWriter<Group> writer = ExampleParquetWriter.builder(p)
>
> .withWriterVersion(ParquetProperties.WriterVersion.PARQUET_1_0)
>                     .withCompressionCodec(CompressionCodecName.SNAPPY)
>                     .withRowGroupSize(1024*1024)
>                     .withPageSize(1024)
>                     .enableDictionaryEncoding()
>                     .withDictionaryPageSize(2*1024)
>                     .withConf(conf)
>                     .build();
>             writer.write(sfg.newGroup().append("s", "abc").append("i",
> 123));
>             writer.write(sfg.newGroup().append("s", "def").append("i",
> 456));
>             writer.close();
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
> }
>
> However, it doesn't work when I try to alter the schema to add a UUID
> field, like this:
>
> String schema = "message spark_schema {\n  optional binary s (STRING);\n
>  optional INT32 i; optional FIXED_LEN_BYTE_ARRAY(16) u (UUID);\n}";
>
> I just get a "No enum constant org.apache.parquet.schema.OriginalType.UUID"
> error. I've tried several variations on this schema so far, but no
> successes so far. Is there something that I'm doing incorrectly with the
> schema? Or is the UUID logical type not supported in parquet-mr yet?
>
> Any suggestions would be much appreciated!
>
> Thanks,
> Brad
>

Reply via email to