Hello, I'm trying to use PEAR files with Hadoop's DistributedCache mechanism. The cache provides all of the distribution and cleanup mechanisms involved with metadata on a cluster of Hadoop datanodes, and the PEARs provide a convenient delivery for NLP pipelines. My problem is that it the DistributedCache is read-only, and the PEAR installation procedures require overwriting macros and creating files in the directory in which it will be used. So for now I install locally, compress the installed PEAR directory, and ship it off to the grid.
Then I use an override mechanism to load an AE from the relocated PEAR: I've modified the uimaj-core source, specifically ASB_impl.java and PearAnalysisEngineWrapper.java, to check for install directory override parameters. If given, the ConfigurationParameterSettings and ExternalResourceSpecifiers in the ResourceCreationSpecifier are modified by replacing the local install directory with the current datanode's DistributedCache directory, where the PEAR now lives. It works great, but I'd rather not deal with maintaining the tainted source, since to me right now it seems like something that was not intended for PEARs. Now that I have some more time to try to do things 'right', is there a preferred way to leverage the API to make a portable pear PEAR when you don't know the name of the directory in which it will ultimately live? DistributedCache directories for a datanode are uniquely stamped, so I can't change anything until the PEAR mechanisms have loaded the description resources into memory. Thanks for your time and effort, using UIMA in MapReduce has been a treat so far! Rob
