RE: Write a list in parquet using JAVA api

Manisha Sethi Mon, 02 Nov 2015 04:03:05 -0800

Thanks for reply.

I am now using ...
Writable[] values = new Writable[1];
        values[0] = new Text("abc");
values[1] = new IntWritable(24);
    ArrayWritable value = new ArrayWritable(Writable.class, values);
     List<String> columnnames = new ArrayList<String>();
    columnnames.add("name");
   columnnames.add("age");
    List<TypeInfo> columnTypes = new ArrayList<TypeInfo>();
    columnTypes = TypeInfoUtils.getTypeInfosFromTypeString("string,int32");
    TypeInfo rowTypeInfo = TypeInfoFactory.getStructTypeInfo(columnnames, 
columnTypes
   writer.write(new 
ParquetHiveRecord(value,(StructObjectInspector)objInspector));



Above code works for String, long, double, float, int,Boolean, date  types..but 
I am getting below wxcwption for datetime and decimal

message basket {
  required int32 b (DECIMAL(2,0));
}

log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.conf.Configuration.deprecation).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Exception in thread "main" java.lang.RuntimeException: Parquet record is 
malformed: 
parquet.column.values.dictionary.DictionaryValuesWriter$PlainIntegerDictionaryValuesWriter
        at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)
        at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
        at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
        at 
parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
        at parquet.hadoop.ParquetWriter.write(ParquetWriter.java:258)
        at ParquetTestWriter.main(ParquetTestWriter.java:107)
Caused by: java.lang.UnsupportedOperationException: 
parquet.column.values.dictionary.DictionaryValuesWriter$PlainIntegerDictionaryValuesWriter
        at parquet.column.values.ValuesWriter.writeBytes(ValuesWriter.java:95)
        at 
parquet.column.values.fallback.FallbackValuesWriter.writeBytes(FallbackValuesWriter.java:162)
        at parquet.column.impl.ColumnWriterV2.write(ColumnWriterV2.java:157)
        at 
parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:346)
        at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writePrimitive(DataWritableWriter.java:302)
        at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:106)
        at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89)
        at 
org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:60)
        ... 5 more

Please help!

-----Original Message-----
From: Mohammad Islam [mailto:[email protected]]
Sent: Saturday, October 31, 2015 6:50 AM
To: [email protected]
Subject: Re: Write a list in parquet using JAVA api

Prematurely sent ...
Adding on Ryan's comment.Sometime it seems confusing to understand how does 
parquet and other object model work.Some relevant link :
http://grepalex.com/2014/05/13/parquet-file-format-and-object-model/
Regards,Mohammad




     On Friday, October 30, 2015 6:18 PM, Mohammad Islam <[email protected]> 
wrote:


 Adding on Ryan's comment.Sometime it seems confusing to understand how parquet 
and object model works.Some relevant link :


     On Thursday, October 29, 2015 10:02 AM, Ryan Blue <[email protected]> 
wrote:


 Hi Manisha,

The main recommendation I have is to not use the
org.apache.parquet.example.* classes. Those are an example of how to implement 
an object model, not classes that can or should be used in an application that 
reads or writes Parquet data.

The best thing is to use one of the real object models, like Avro or Thrift. 
That way you get the option of using row-oriented or column-oriented storage in 
your application without translating between object models.

rb

On 10/29/2015 01:46 AM, Manisha Sethi wrote:
> Hi All,
>
> I am trying to write a list in parquet using below code, but something is 
> going wrong..
>
> MessageType schema = MessageTypeParser.parseMessageType("message basket {  
> required group myList (LIST) { repeated group list { required float 
> listfloat;} } }");
> ParquetWriter<Group> writer=new ParquetWriter<Group>(outDirPath,new 
> GroupWriteSupport() {
> @Override
> public WriteContext init(Configuration configuration) {
>    if (configuration.get(GroupWriteSupport.PARQUET_EXAMPLE_SCHEMA) == null) {
>      configuration.set(GroupWriteSupport.PARQUET_EXAMPLE_SCHEMA, 
> schema.toString());
>    }
>    return super.init(configuration);
> }},CompressionCodecName.SNAPPY, 256*1024*1024, 100*1024);
> GroupWriteSupport.setSchema(schema,config);
> SimpleGroupFactory f=new SimpleGroupFactory(schema);
> writer.write(f.newGroup().append("listfloat", ( 
> float)2.8).append("listfloat", 3.3f));
>
>
> Its not working....exception :
> log4j:WARN No appenders could be found for logger 
> (org.apache.hadoop.conf.Configuration.deprecation).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Exception in thread "main" parquet.io.InvalidRecordException: listfloat not 
> found in message basket {
>    required group myList (LIST) {
>      repeated group list {
>        required float listfloat;
>      }
>    }
> }
>
>        at parquet.schema.GroupType.getFieldIndex(GroupType.java:147)
>        at parquet.example.data.Group.add(Group.java:39)
>        at parquet.example.data.Group.append(Group.java:107)
>        at ParquetTestWriter.main(ParquetTestWriter.java:90)
>
>
>
> Appreciate the response!!.
>
> Manisha
>
> ________________________________
>
>


--
Ryan Blue
Software Engineer
Cloudera, Inc.






________________________________

RE: Write a list in parquet using JAVA api

Reply via email to