Re: Question regarding the use of TaskAttemptContext on ParquetOutputFormat

Ryan Blue Fri, 26 Jun 2015 16:35:23 -0700

I thought the wrapper was translating from the mapred API used by Hiveto the mapreduce API that Parquet implements. If there is a better wayto do this that is less expensive, I think that would be a good change.

rb


On 06/26/2015 04:01 PM, Sergio Pena wrote:

Hi,

I see ParquetRecordWriterWrapper constructor is getting/initializing
a TaskAttemptID object that will be passed to the
getRecordWriter(TaskAttemptContext taskAttemptContext, Path file) method of
ParquetOutputFormat. But this method only gets the Configuration and
CompressionCodeName objects to pass to another constructor.

My question is, if TaskAttempID links the Configuration object from the
JobConf parameter of ParquetRecordWriterWrapper, and the Code name can be
retrieved from the JobConf or Properties objects, is there another reason
about using TaskAttempID?

During some java profile tests, I noticed
that ContextUtil.newTaskAttemptContext() takes some time to initialize, and
we can save that time if we use the other constructor.

- Sergio



--
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: Question regarding the use of TaskAttemptContext on ParquetOutputFormat

Reply via email to