Hey Folks, If you want another (like DRAT [1]) turnkey Apache OODT application, take a look at BigTranslate [2]. I’ve given it a full makeover. There are some lingering things I want to do like Docker and making some things a bit easier to take care of, but it’s pretty much done, and churning again translating thew 190M row DARPA XDATA employment data right now.
I welcome any and all contributions. A few things to note out of this: 1. BigTranslate inspired by 2 blog / wiki posts on understanding Apache OODT metadata especially during pipeline processing: https://cwiki.apache.org/confluence/x/gASkAg https://cwiki.apache.org/confluence/x/DAWkAg These are useful posts and if we are doing a website redesign should be emphasized as they help to really understand what’s going on during large scale Apache OODT processing. 2. There are a bunch of TODO and to be filed issues for Apache OODT that I found while fixing and productionizing BigTranslate. In no specific order they are: * change PathUtils#getEnv to use System.getEnv * change PathUtils#getEnv to be static and only load the properties 1 time per JVM * investigate cas-pge valueless key with workflowMet should push into workflow met with existing value and not look for key-ref * update cas-crawler say what preconditions failed on crawling * create better error messages when crawler actions fail * radix query tool path needs better deployment * sortBy in Query tool is broke b/c of unsupported operation exception I’ll be filing the above issues and fixing them in 1.x branch and 2.x going forward over the next week. Comments and improvements welcomed in BigTranslate! Also maybe we should make a wiki page that lists our full end to end, usable apps like BigTranslate and DRAT. Cheers, Chris [1] http://github.com/chrismattmann/drat/ [2] http://github.com/chrismattmann/bigtranslate/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++