pig-user  

Re: loading using a custom input formatter

pi song
Fri, 06 Jun 2008 06:04:11 -0700

You're right. The first impedance is the difference between input semantic
where Hadoop expects K,V  but Pig expects Tuple. However this doesn't stop
us from encapsulating K,V as fields in Tuple. I had a brief look at
PigInputFormat and I think there is a possibility that we can build a
special input format which will allow users to plug-in existing Hadoop input
format. This is a very nice feature to have!!.  BTW, I guess we just have to
wait a bit more as this seems to require changes in MapReduce execution
engine which is currently being completely rewritten.

I will create a placeholder Jira for this.  Thanks a lot for your idea.

Pi

On Fri, Jun 6, 2008 at 9:49 AM, Manish Shah <[EMAIL PROTECTED]> wrote:

> We have a custom input formatter that we use for regular map/reduce jobs.
>  Is there a way to make use of this input formatter in pig?  We've looked at
> most of the docs, and havent found much.  The issue we have is that we arent
> loading data from a single file.  Also the number of files is not
> determinable so we cant just write separate load commands in our pig latin.
>
> The input formatter we have takes care of giving back records that conform
> to key/value semantics for hadoop map/reduce functions.  Is there a reason
> it couldnt be used to generate tuples from the resultant records?
>
> - Manish
> Co-Founder Rapleaf.com
> http://www.rapleaf.com/pub/Manish-Shah
>
>