Overriding PEAR Installation Metadata For Moving it Using Hadoop's DistributedCache

Robert Spurrier Mon, 24 Aug 2015 11:10:43 -0700

Hello,

I'm trying to use PEAR files with Hadoop's DistributedCache mechanism. 
The cache provides all of the distribution and cleanup mechanisms involved 
with metadata on a cluster of Hadoop datanodes, and the PEARs provide a 
convenient delivery for NLP pipelines. My problem is that it the 
DistributedCache is read-only, and the PEAR installation procedures require 
overwriting macros and creating files in the directory in which it will be 
used. So for now I install locally, compress the installed PEAR directory, 
and ship it off to the grid.


Then I use an override mechanism to load an AE from the relocated PEAR:
I've modified the uimaj-core source, specifically ASB_impl.java and 
PearAnalysisEngineWrapper.java, to check for install directory override 
parameters. If given, the ConfigurationParameterSettings and 
ExternalResourceSpecifiers in the ResourceCreationSpecifier are modified 
by replacing the local install directory with the current datanode's 
DistributedCache directory, where the PEAR now lives. It works great, but 
I'd rather not deal with maintaining the tainted source, since to me right 
now it seems like something that was not intended for PEARs.

Now that I have some more time to try to do things 'right', is there a 
preferred way to leverage the API to make a portable pear PEAR when you 
don't know the name of the directory in which it will ultimately live? 
DistributedCache directories for a datanode are uniquely stamped, so I 
can't change anything until the PEAR mechanisms have loaded the 
description resources into memory.


Thanks for your time and effort, using UIMA in MapReduce has been a treat 
so far!


Rob

Overriding PEAR Installation Metadata For Moving it Using Hadoop's DistributedCache

Reply via email to