I thought the wrapper was translating from the mapred API used by Hive to the mapreduce API that Parquet implements. If there is a better way to do this that is less expensive, I think that would be a good change.

rb

On 06/26/2015 04:01 PM, Sergio Pena wrote:
Hi,

I see ParquetRecordWriterWrapper constructor is getting/initializing
a TaskAttemptID object that will be passed to the
getRecordWriter(TaskAttemptContext taskAttemptContext, Path file) method of
ParquetOutputFormat. But this method only gets the Configuration and
CompressionCodeName objects to pass to another constructor.

My question is, if TaskAttempID links the Configuration object from the
JobConf parameter of ParquetRecordWriterWrapper, and the Code name can be
retrieved from the JobConf or Properties objects, is there another reason
about using TaskAttempID?

During some java profile tests, I noticed
that ContextUtil.newTaskAttemptContext() takes some time to initialize, and
we can save that time if we use the other constructor.

- Sergio



--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to