I've created the Jira issue here: https://issues.apache.org/jira/browse/PARQUET-1827
I wonder if any Parquet implementations support this UUID type? Anyway, thanks for your help, Brad On Wed, Mar 25, 2020 at 11:05 AM Gabor Szadovszky <[email protected]> wrote: > Hi Brad, > > So, the UUD logical type is added to parquet-format which is the > specification of Parquet. It is not yet implemented in parquet-mr so you > are not able to use it. > However, parquet-mr does not provide too much support for logical types > anyway, so you might simply use the FIXED_LEN_BYTE_ARRAY(16) primitive type > without the logical type. > Feel free to file a jira for parquet-mr to implement it. > > Cheers, > Gabor > > On Wed, Mar 25, 2020 at 3:32 PM Brad Smith <[email protected]> wrote: > > > I recently read about the new UUID logical type introduced in > > parquet-format 2.4.0. I'm interested in trying it out, but I haven't been > > able to figure out how to make it work so far. > > > > For example, the code below uses the parquet-mr library to output a very > > simple test Parquet file with one string field and one int field: > > > > public class ParquetCreator { > > public static void main(String[] args) { > > String schema = "message spark_schema {\n optional binary s > > (STRING);\n optional INT32 i;\n}"; > > MessageType readSchema = > > MessageTypeParser.parseMessageType(schema); > > SimpleGroupFactory sfg = new SimpleGroupFactory(readSchema); > > Configuration conf = new Configuration(); > > GroupWriteSupport.setSchema(readSchema, conf); > > Path p = new Path("file:///tmp/testfile.parquet"); > > > > try { > > ParquetWriter<Group> writer = ExampleParquetWriter.builder(p) > > > > .withWriterVersion(ParquetProperties.WriterVersion.PARQUET_1_0) > > .withCompressionCodec(CompressionCodecName.SNAPPY) > > .withRowGroupSize(1024*1024) > > .withPageSize(1024) > > .enableDictionaryEncoding() > > .withDictionaryPageSize(2*1024) > > .withConf(conf) > > .build(); > > writer.write(sfg.newGroup().append("s", "abc").append("i", > > 123)); > > writer.write(sfg.newGroup().append("s", "def").append("i", > > 456)); > > writer.close(); > > } catch (Exception e) { > > e.printStackTrace(); > > } > > } > > } > > > > However, it doesn't work when I try to alter the schema to add a UUID > > field, like this: > > > > String schema = "message spark_schema {\n optional binary s (STRING);\n > > optional INT32 i; optional FIXED_LEN_BYTE_ARRAY(16) u (UUID);\n}"; > > > > I just get a "No enum constant > org.apache.parquet.schema.OriginalType.UUID" > > error. I've tried several variations on this schema so far, but no > > successes so far. Is there something that I'm doing incorrectly with the > > schema? Or is the UUID logical type not supported in parquet-mr yet? > > > > Any suggestions would be much appreciated! > > > > Thanks, > > Brad > > >
