On 09/07/2014 01:45 PM, Rafeeq S wrote:
I am newBee to parquet.

Please suggest an example to write data into *parquet* file using
parquetFileWriter.

I have tried below Example code to write data into parquet file using
*parquetWriter*.

http://php.sabscape.com/blog/?p=623

The above example uses parquetWriter, But I want to use ParquetFileWriter
to write data efficiently in parquet files.

Please suggest an example or how we can write parquet files using
*ParquetFileWriter* ?

Regards,

Rafeeq S

Hi Rafeeq,

ParquetFileWriter is actually an internal implementation that's used by higher-level interfaces, like parquet-avro, parquet-thrift, and others. The reason is that Parquet doesn't have its own object model that it makes you use. It has an API so that you can use whatever model you want backed by the Parquet format. The Avro classes are a good demonstration, where you use Avro runtime objects and Schemas, but the results are stored as Parquet files.

This is great if you're moving from another serialization library to Parquet because there is very little code to change and you don't have to translate. But if you just want to store data in Parquet, then you first need to choose what library you want for runtime objects.

I highly recommend using Avro objects because both Avro and Parquet are splittable Hadoop-friendly formats. Plus, Avro has a lot of flexibility: you can use generic objects, generate objects from your data schema, or generate a schema from java classes. Here's what it would look like to write Strings:

  Schema schema = Schema.create(Schema.Type.STRING);
  writer = new AvroParquetWriter(
      new Path("/file/path.parquet"), schema);
  writer.write("a string");
  writer.close();

Of course, the Avro objects can be a lot more complicated than Strings, but should work just fine as long as the object matches the Schema you provide to build the writer.

Does this help?

rb

--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to