I've created the Jira issue here:
https://issues.apache.org/jira/browse/PARQUET-1827

I wonder if any Parquet implementations support this UUID type?

Anyway, thanks for your help,
Brad


On Wed, Mar 25, 2020 at 11:05 AM Gabor Szadovszky
<[email protected]> wrote:

> Hi Brad,
>
> So, the UUD logical type is added to parquet-format which is the
> specification of Parquet. It is not yet implemented in parquet-mr so you
> are not able to use it.
> However, parquet-mr does not provide too much support for logical types
> anyway, so you might simply use the FIXED_LEN_BYTE_ARRAY(16) primitive type
> without the logical type.
> Feel free to file a jira for parquet-mr to implement it.
>
> Cheers,
> Gabor
>
> On Wed, Mar 25, 2020 at 3:32 PM Brad Smith <[email protected]> wrote:
>
> > I recently read about the new UUID logical type introduced in
> > parquet-format 2.4.0. I'm interested in trying it out, but I haven't been
> > able to figure out how to make it work so far.
> >
> > For example, the code below uses the parquet-mr library to output a very
> > simple test Parquet file with one string field and one int field:
> >
> > public class ParquetCreator {
> >     public static void main(String[] args) {
> >         String schema = "message spark_schema {\n  optional binary s
> > (STRING);\n  optional INT32 i;\n}";
> >         MessageType readSchema =
> > MessageTypeParser.parseMessageType(schema);
> >         SimpleGroupFactory sfg = new SimpleGroupFactory(readSchema);
> >         Configuration conf = new Configuration();
> >         GroupWriteSupport.setSchema(readSchema, conf);
> >         Path p = new Path("file:///tmp/testfile.parquet");
> >
> >         try {
> >             ParquetWriter<Group> writer = ExampleParquetWriter.builder(p)
> >
> > .withWriterVersion(ParquetProperties.WriterVersion.PARQUET_1_0)
> >                     .withCompressionCodec(CompressionCodecName.SNAPPY)
> >                     .withRowGroupSize(1024*1024)
> >                     .withPageSize(1024)
> >                     .enableDictionaryEncoding()
> >                     .withDictionaryPageSize(2*1024)
> >                     .withConf(conf)
> >                     .build();
> >             writer.write(sfg.newGroup().append("s", "abc").append("i",
> > 123));
> >             writer.write(sfg.newGroup().append("s", "def").append("i",
> > 456));
> >             writer.close();
> >         } catch (Exception e) {
> >             e.printStackTrace();
> >         }
> >     }
> > }
> >
> > However, it doesn't work when I try to alter the schema to add a UUID
> > field, like this:
> >
> > String schema = "message spark_schema {\n  optional binary s (STRING);\n
> >  optional INT32 i; optional FIXED_LEN_BYTE_ARRAY(16) u (UUID);\n}";
> >
> > I just get a "No enum constant
> org.apache.parquet.schema.OriginalType.UUID"
> > error. I've tried several variations on this schema so far, but no
> > successes so far. Is there something that I'm doing incorrectly with the
> > schema? Or is the UUID logical type not supported in parquet-mr yet?
> >
> > Any suggestions would be much appreciated!
> >
> > Thanks,
> > Brad
> >
>

Reply via email to