Hi Guys, We're currently working at JPL on figuring out how we can make a Shark/Spark interface for Apache OODT which can be used for ETL and workflow management:
http://oodt.apache.org/ OODT currently supports RDBMS, Solr/Lucene, and we are also working on a Gora plugin for it too. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-5th floor Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: William Kang <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Thursday, May 22, 2014 7:49 AM To: "[email protected]" <[email protected]> Subject: ETL and workflow management on Spark > > > >Hi, >We are moving into adopting the full stack of Spark. So far, we have used >Shark to do some ETL work, which is not bad but is not prefect either. We >ended writing UDF and UDGF, UDAF that can be avoided if we could use Pig. > > >Do you have any suggestions with the ETL solution in Spark stack? > > >And did any one have a working work flow management solution with Spark? > > >Many thanks. > > > > >Cao
