Hi Edward,

 I was able to use the distributed cache, using the set
mapred.cache.files option. I could read the files locally using standard
java api's.

Thanks

Viraj

 

________________________________

From: Edward Capriolo [mailto:[email protected]] 
Sent: Tuesday, June 22, 2010 7:44 AM
To: [email protected]
Subject: Re: Using Distributed Cache in Hive UDF's??

 

Shameless plug. 

IF you put a file in the distributed cache it is in the working
directory of the UDF so you do not need fancy hadoop isms to access it.

Shameless plug:
My geo-ip-udf does exactly this.
http://www.jointhegrid.com/hive-udf-geo-ip-jtg/index.jsp
http://www.jointhegrid.com/svn/hive-udf-geo-ip-jtg/

Edward

On Mon, Jun 21, 2010 at 7:03 PM, Viraj Bhat <[email protected]> wrote:

Hi all,

 I have a lookup function in hive which looks if a certain pattern is
present in a large text file. I upload this text file to HDFS. I hope to
use this text file in my UDF  evaluate() method.

Is there some documentation I can look up? 

Distributed Cache relies on

lookupFiles = DistributedCache.getLocalCacheFiles(job);

job is of type JobConf.

Where do I get the JobConf object from within the UDF?

 

Thanks

Viraj

 

Reply via email to